Theory of Soccer Pt. 3: On Account of Bounce
What the rise of xG contributed to theory and what it did not
I. Play
In 2014 , Carlin Wing an Assistant Professor of Media Studies at Scripps College, wrote one of my favorite paragraphs about sports:
All cultures engage in some form of ball play. Ball games are a basic way for us to hone what computational neuroscientist Beau Cronin calls “the quotidian spatiotemporal genius of the human brain,” and over the past two hundred years, they have come to dominate the popular imagination…. All ball sports are aleatoric structures organized, to greater or lesser degrees, around bounce. Aleatoric structures—structures of planned chance—produce a reliable kind of uncertainty. We don’t know who will win and who will lose, but we know that at the end of the day, there will be a winner and a loser. A ball introduces a second, more uncertain, kind of uncertainty into the fray. Its bounce dances along the edge of our predictive capacity, always almost but never fully under control. At least in the Anglophone world, this second kind of chance—the chance of the ball—seems to be especially important to our contemporary understanding of play. While other kinds of contests are raced, run, rowed, and swum; wrestled, fenced, fought, and boxed; timed, weighed, measured, and judged; ball games are played. And only an athlete who contends with balls (or pucks, or shuttlecocks, or other third objects) earns the title “player.” We become players in and through bounce.
Bounce does not belong to any one object, surface, or body. It is a property distributed among these things: a name for those kinds of collisions from which all of the entities involved emerge with their respective shapes and speeds relatively intact. Which is to say, they survive.
THEY SURVIVE! I love it. You should read the whole thing - there is SO much there. In her piece, Wing reflects on and explores sport and play, what is “fair play” and who gets to decide it and more, and in this way it is grander and beyond the scope of this post. I’m going to take a small piece and run up this opening segment by starting with this idea that the fundamental element that separates “ball games” from “serious sport” is the consensus acceptance of chance that is built into them by design. And first, I just want to affirm that as being really good. On the whole, sports are fun and playful instead of violent and fascist, and I find it persuasive that ultimately one reason for this is simply that balls do funny things and this brings joy where fully deterministic competition otherwise might bring pain. When you’re playing any game, the less predictable the outcome, the more enjoyable it is to undertake with others, and the more enjoyable it is to watch and engage with. In a way, games are not played because people want to know who is best. They are played because they’re fun.
Further, the degree to which chance is acceptable in soccer is one of the things that makes it soccer and not some other ball game. It’s one of the things that makes it the best. It’s the chanciest ball game, basically. In her essay, Wing lays out this argument that a defining layer of uncertainty in a game comes from the bounce of the ball. And while I’d admit soccer doesn’t have the bounciest ball of all the ball sports — it’s not covered with some sublime rubber surface nor does its ball bounce the most erratically because of some irregular shape— the fact that you have to manipulate the ball using your foot makes it super chancy. It’s hard. These other sports give you pretty reliable ways to control the ball (normally your hands - catching and throwing), but soccer by forcing you to use the same weirdly shaped body part that you must use to run and to stand(!), seems to accept or even embrace that we simply are not in control of everything. What a relief honestly. And if we want to even pretend we are in control, it’s not to be found individually but through vulnerable cooperation.
One way to reflect on the degree to which soccer is chancy is to think about competitive disparities. In the biggest soccer leagues in the world the competitive mismatches that face off in a soccer match are astronomical compared to other major sports, and yet the outcomes are hardly ever known to any significant degree until the matches conclude. This just wouldn’t work in other sports. Imagine an NFL game between a team whose players make a combined $500M in wages and a team whose players make a combined $12M in wages. One team would have stronger and quicker offensive and defensive lines, faster players, a quarterback with more experience and a better arm, and an army of depth at every position — to say nothing of their preparatory advantage in training facilities and technical staff, technology and nutrition, all of it. With very little chance sprinkled in over the top of this to make it joyfully messy (say, a little mis-control from the behavior of a freely moving ball), this gridiron matchup would be a slaughter, nearly pointless to play out. And because viewers like things that are worth watching, and the NFL likes viewers, they design their product to prevent this — they salary cap all the teams around $200M to make them more or less even. A cleverly constructed team can still achieve more success over some time horizon, but it’s not a foregone conclusion on any given Sunday. By contrast, precisely this sort of financial mismatch happens all the time in soccer with financial giants host struggling minnows and vice versa, and yet while over time the strongest rosters generally win, it’s rarely pointless to play any single match out to a result. Perhaps it’s not “common” for David to beat Goliath, but it’s not exceedingly rare either. And it is quite common for them to draw. People find these games worth watching even though the team’s aren’t even, and part of it is due to this greater uncertainty. Soccer is hard and chance is fun.
Why am I talking about this? Because in this series of posts, we seek the grail: a theoretical foundation for soccer. So in a sense, we’re foraging here and there for some sort of comprehensive underlying principles in the game that we can build a theory around, and in today’s post, we’re going to rummage around in “soccer analytics” broadly, looking for clues.
OK, so when I talk about “soccer analytics” in this one, I’m making a (unpopular) choice to only really talk about "expected goals” (xG) and its early cousins (xA, xPass even), and I’m going to call this “the first wave of soccer analytics,” recognizing of course that this is a bad term, that soccer analytics and stats writing more broadly (which is something different too) is way more than xG, that really cool stuff existed before xG, during xG’s rise, and obviously continues today. And while, xG is the least interesting thing going on in soccer analytics (albeit the only mainstream bit to break through yet), I want to focus on it today for reasons that I hope to make clear. We’ll save some of the rich theoretical analytics work and the rise of possession value models for a later chapter. And maybe this is a risk, but I’m just going to skip entirely an explanation of what “expected goals” is/are and jump right in.
On my read then, soccer analytics (of the xG variety) has contributed something vital to better understanding the sport. It has been able to articulate an important part of soccer’s core probabilistic essence, but by no means has it approached comprehensively doing so, and there were some mistakes along the way. In this post, I try to identify where the first wave of soccer analytics genuinely contributed to theory (which is somehow both specifically limited, and aesthetically pervasive), and importantly, what of its byproducts we might borrow as we continue on our journey here toward building a coherent theory. I also want to touch on where it fell discouragingly short— where we should exercise caution in using it to explore how stuff works.
II. Non-analysis as a genuine contribution to soccer theory
What xG models and soccer analytics writ large were able to see and articulate first was simply the uncontrollable, unanswerable bounce of the ball that Wing talks about in the opening passage, one of the core parts of the game that makes it joyful. While the sport’s individual moments we love can be framed (rightly or wrongly) as players majestically “controlling” the ball to do wonderful things, it is this macro setting of chance and “miscontrol” (and therefore random variation) that sets the stage for playful (yet meaningful) competition, and feats of brilliance.
For whatever reason (be it natural or societal), we love to attribute causality to things (especially on a one to one basis), to attribute success to individual will or to a player’s strength or skill or sharpness, or ruthlessness, or to a successful team’s coordination and togetherness, and to attribute failure to the absence of these things. It’s cleaner. For certain parts of the brain this sort of logic gives us pleasure. But what xG and soccer analytics saw so clearly was simply that even if you accepted that all those other things were real and that they were present in the football matches themselves, that they did impact the results, that you could let’s say attribute them to players and coaches and teams even — if you put all that to the side and blessed it (and many did not) — there was still just the pure chance of it all.. looming materially over top of the results, which made the results contain both truths and untruths. And to be fair, the results were never the only thing that you could analyze in soccer, but you know how these things go. Results tend to dominate (hold that thought).
With xG and related analyses articulating that shots were rarely converted (roughly 1 in 10 attempts are scored), that shots were more likely to go in if they were taken closer to the goal and more in front of it than further away or from wider angles, and that once you controlled for a pretty small subset of factors such as these (and some other contextual stuff in the event data) players and teams tended not shoot significantly better or worse in the medium and long term than the averages (that is to say shot conversion regressed towards the mean), well.. analytics suggested that on top of all that other real soccer stuff, it was also just super random. Over the medium and long term, teams created more goals when they created more/better scoring chances than their opponents (not by converting these chances at superb rates). And I think deep down people knew this basically(?), but oh the cold comfort of the accounting done by the league table, the data that comprised tallies of goals scored and conceded… the numbers! How cruelly persuasive data is in the absence of all the other data.
You needed to compartmentalize the bounces that went your way some weeks and betrayed you in others. You needed to quarantine this randomness to an extent from your analysis … assuming your analysis started with the league table. And of course it’s this randomness that is inextricable from soccer’s identity as a “ball game” and it is the degree of this randomness relative to other ball games that makes soccer the most ball gamiest of them all. When you take a crack at writing about or just analyzing a soccer match you’ve just watched or several months worth of them, and you’re forced to reckon with the idea that a non-trivial portion of what you’re analyzing cannot be analyzed in relation to the game plan, or the decisions, or the strength and dexterity of the athletes, or the execution of tactical ideas by the players, but instead can only (partly) be analyzed in relation to an unanswerable static probability or dice roll (a bounce of a ball), I mean.. that just doesn’t feel good at all.
And there is a sincere and perhaps cruel irony here. For often the purveyors of xG, what with their spreadsheets and data models and all, were (are) accused of “over-analyzing” a “simple game” best left undissected, that by using all these numbers and decimal points they’re somehow “reading too much into it,” when in fact, they are doing the opposite. Instead, you could say that what provokes parts of the football ecosystem into all out war against analytics is actually xG’s insistence that someone not analyze some portion of the game— that to do so is to almost certainly get it wrong. In this way, xG is the complete negation of football analysis for a certain portion of a team or player’s results (when you’re measuring them in goals or points). It renders some share of football analysis moot, and deep down it’s this that pisses people off, not the “over-analyzing” that they accuse analytics of. It’s actually the xG models that are instead telling people that love the game that they are reading too much into it, that they are over-analyzing (how dare they). And for sure, telling someone they shouldn’t analyze something they’re passionate about is annoying - it’s even why the data people get pissed when this is levelled at them in rebuttal. This is understandable. When you love a sport like soccer, sometimes it feels like it’s everything and the only thing. You WANT to over-analyze it. But here’s xG just saying (for anyone that will listen) “LOL, bouncy ball go brrrr.”
In this way, when analytics is doing it right, the objectors to xG are truly projecting when they allege the analytics community are reading too much into something that’s actually just pure fun. It’s complicated I guess, but data people (with intention or not) are calling for an absence of analysis, and yes sometimes, fun is the absence of analysis (I type this as I check the wordcount for the first of many times). For better or worse it’s this very truth, this core element of the sport, one of the things that makes it so fun, the “bounce” that xG actually sees, reveals, affirms, and articulates as being beyond traditional sporting analysis. You might say (carefully and accepting the risks that follow) that analytics approaches the divine not by articulating what it believes it to be, but instead by investigating what it is not, by exploring its unknowability.
And so it’s worth just grounding us real quick in like.. why this modern xG model came to be, because it wasn’t designed to uncover how soccer worked or the secrets of the game, the “how” and the “why.” A theory of soccer wasn’t the point. xG was mostly created for the purpose of projecting (more robustly) an estimate of how a season was going to go (or how the remainder of a season might turn out), and importantly to do it slightly better than a similar projection based solely on the existing standings, or on last season’s standings, or based on on the teams’ goal differences. It was there to help us figure out who was really “good” in the sense that they were likely to be good in the future. And it succeeded in this (compared to what else was there). It beat up on those other projection techniques because it replaced meh data (the standings and goals) with more data and richer data (shots and the limited context of said shots). And one of the fun, prickly things about this was that in medias res you might observe a team or a player on a hot streak just bagging goals left and right, and sometimes the xG was gently suggesting that this run of form was not sustainable, and this made for initially fun (and now tiring) arguments! But importantly, the argument from the xG partisans was basically “yep, we see that big number and we know it looks fun, and god knows we like a big fun number and and I know it seems like a big deal (this is the nature of recorded numbers), and goals absolutely rule, but you need to be ignoring a certain percent of that big number, because history suggests (when run through a data model) that it’s fake.
III. The Fall
I mentioned above that the anti-xG crowd were projecting about the “over-analyzing” thing, but they weren’t the only ones projecting here. The xG revolution would end up going off the tracks in several places, as all things seem to. When and where it went off the tracks, it did so only in part because constantly telling someone they can’t analyze something is really annoying (especially if you’re doing it with such low R’s!), even if you’re doing so in order to cast a spotlight on the part of the results where analysis is more worthwhile. But it mostly went off the tracks because while the unanswerable probability in soccer is significant, it is by no means the lion’s share of what makes soccer soccer, or even the lion’s share of what makes the game fun to analyze, and definitely not the lion’s share of understanding how to play well, or what players might be good transfer targets, or what the optimal strategies for winning are.
Knowing the xG table (or some slice of it therein) might give you a better chance at identifying which teams are good than the other guy who’s just looking at the shiny league table - with its summations of the individual match results, but it doesn’t necessarily mean you understand soccer better than him. He’s just doing bad data analysis. Because you’re looking at slightly richer data than him, you can see more reliable evidence that a team is “playing well” and so they’ll probably play better in the future, but you don’t know how or why the team is playing well, what it is they’re doing specifically from a process perspective which allows them to generate these favorable outputs, and neither does he as far as we know.
But shit, the guy looking at the league table might be the ghost of Johan Cruyff, sentenced for eternity to a hell of never watching soccer matches, only allowed to glimpse the league table from time to time (like parents of small children). If you freed him from this prison of the damned, unshackled him and allowed him to float around from stadium to stadium, just watching every single match, free of cognitive biases and equipped with perfect memory, you know.. because his mind is ephemeral now, well that dude won’t need xG to know who is “playing well” (he’ll still be wrong about goalkeepers though).
Anyway, beyond the (correct) assertion that a certain portion of the sport’s outcomes should not be analyzed except via mean regression, xG had virtually nothing to say about the rest of the sport, its beauty and complexity, how it works or why it works (and that’s fine, it didn’t need to know about these things in order to be a good model, the shot inputs were enough). But this other stuff is practical for analyzing important stuff. For example, you can note that Liverpool’s title-winning run was achieved at a rate that the models suggested were unsustainable in the long run and then you could chip away here and there at why that might have been the case - why the goals were soaring past the xG, but you’d mostly be doing that next part - the actual footballing analysis - without any help from xG itself. And when you try to pick apart why Liverpool’s chance creation was so good in the first place (because that’s where you’d have to start), why it’s underlying xG was so good, obviously xG is just simply silent there. It gives you next to nothing, I mean … it does this by design. If you want to understand why Liverpool were good - why their xG was so good, you have to analyze something else, something tactical or something about player skills, or something else in the data you have, or something else in the data you don’t have.
And I should note that this handoff between that which xG can tell us, and that which xG can only inquire of is very well exemplified by the tightrope that Michael Caley and Mike Goodman must walk on the “Double Pivot” podcast, and have been doing so for years. You can almost pick any random episode and any random timestamp and listen for about 10 minutes and find this continual mode of (what by definition must be, and is) careful, deliberate conjecture, caught in between the data and… something else. This in-between-state is usually a question, a “cloud of unknowing” (It has taken me years to appreciate this about their show and I still struggle with it).
But sometimes, the xG revolution was less uh.. upfront about this part of what it was saying and not saying. It’s one thing to point out where (and when) performance analysis and tactical analysis are not productive (chance conversion) so as to focus attention toward the other stuff that should be analyzed (the underlying performance). It’s another thing to say “don’t analyze this stuff over here” and also “mission accomplished, we figured out soccer” and then write a book that’s mostly just about how to beat the odds in sports betting while you literally suggest that without adopting the Expected Goals Method, pundits and managers alike cannot accurately or reasonably comment on the game…
There were more egregious suggestions than that too — some of which have been apologetically retracted since, but I think even in the most well-meaning, careful and conscientious corners, what’s often emphasized above theoretical inquiry is this need for analysts to better communicate their insights to club personnel…. perhaps the packaging of analytics related insights into better, crisper, and cleaner data visualization would be like folding medicine into a slice of cheese, or maybe they need to focus more on translating their findings into common terms the staff uses to discuss the game - these sorts of ideas. And this sentiment, on my read at least, and while coming from a sensible place of wanting to communicate valuable insights more effectively starts at this notion that something about how soccer works has been solved by the data models and that the gospel need only be spread to non-believers, to soccer people so they learn it and do soccer stuff better. It’s easy for me to say - up here in my hobbyist ivory tower, outside of the day-to-day pressures of the footballing world (and with no data viz skills to speak of), but that sort of prioritization feels like a misstep. Seems to me like for every 1 piece of insight that an analyst needs to pretty-up or translate, there’s 20 questions about soccer begging to be asked of the footballing practitioners on the coaching staff (or the players), and 100 more questions that the staff themselves lose sleep over that you might inventory and begin to work through. And a lot of those questions need to be answered with other questions like “do I have all of the data I need to actually go about exploring this question?” Perhaps clubs prefer to budget for a department that proposes answers rather than questions, but innovation comes from investment in R&D. And all research starts with a question. It’s intuitive to me that football clubs would find this money well spent, and for sure some do.
IV. Awkward transition
From a tactical perspective, while the “xG revolution” improved the transparency of a very intuitive finding — the idea that you should try to shoot from closer to goal — most everyone already understood this (or does today). The hard part is that the other team knows this too and they’re trying to stop you from doing it! And they’re waiting for you to fuck up so they can go try to get a shot close to your goal. Most of soccer is this struggle not to choose the better shots from a menu of possible shots, but the struggle to find the menu or wrestle it out of someone else’s hands, the struggle in between the shots. And since, on your way to finding or creating shooting opportunities you’re going to lose the ball most of the time, it’s the struggle to lose the ball in adequate spots instead of bad spots while you build the capacity to find opportunities to take these better shots— or it’s the struggle to do all this with the goal of threatening a number of dangerous ideas at once so that the defense cannot cover all of them well and is then forced to ultimately concede one of the possible openings (while not giving up too much going the other way), and it’s the struggle to do all this in reverse, and to do it while the ball is doing weird shit like every 10 seconds and making you start over.
In the NBA, because of the “shot clock,” most trips down the floor culminate in a shot attempt, so it makes sense to talk about “shot selection.” But 9 out of 10 possessions in soccer do not end with shots. Instead, they end in new possessions which are contingent on where and how the last possession ended, and how and where and when these new possessions start impacts the likelihood they’ll end in shots, and since these new possession probably won’t end in shots, it impacts the range of possible ways (how, why, when) that the possessions after those will start. So to the extent “shot selection” is even a thing that exists at all, it exists solely around the edges.
Anyhow, of course we care about these other things in soccer which xG cannot directly see (the beauty and complexity, the “how” and the “why”), because again in this cycle of posts, we’re trying to excavate and articulate a theory of soccer- how the damn thing works. Soccer isn’t just about chance and it’s not just about choosing smart shots, it’s about struggling to create, restrict, occupy, manage, destroy, repair and beautify(?) space and time for the purpose of creating the capacity to move the ball into good positions to shoot and score, and this is mostly what coaches and players think about all the time (whether they say it convincingly or not).
Through a sort of apophatic method, by negating what within soccer results do not persist (stripping out the noise) and then indirectly through that very statement’s inverse affirming the other things (focusing on the signal), analytics did contribute something to the way we explore soccer beyond properly scoping the role of uncertainty. It constructively helped to drive analytical focus to a certain part of the soccer (the creation of good probabilistic scoring opportunities) and away from other parts (the rate at which teams convert these opportunities). This is how the quantitative method “thinks” about the contribution at least: that which regresses or expires into nothingness over the medium/long term and that which persists.
But to be more holistic about it, more optimistic (because we shouldn’t - not blindly at least - let the framing of data that is collected and the mathematical methods that use this data define the framing of the sport), it’s not just that analytics properly identified what soccer wasn’t (that which regresses) as a means to properly illuminate what it was (that which persists). After all, the uncertainty/bounce it identifies is more than just noise (and this noise exists in more than just shots!). I mean, maybe it’s noise relative to a projection of future goal scoring, but qualitatively, it’s of course a core element of the sport, something to open one’s heart to and something to hold in our periphery as we go forward thinking through theory. Instead, xG kind of just said “if you’re going to do analysis about soccer, use probability and regression analysis over here on the bounciness and use something else over here on the other stuff (say, tactical analysis or performance analysis or spatial/temporal analysis).” Discouragingly though, because of limitations in the data on-hand, analytics itself was at first really only initially equipped to the do the first part: expertly explain away a big chunk of the bounciness using math by referencing the xG.
This critique applies largely to xG, but also it’s cousins. xA* told you regardless of the official assist tallies which players were providing the last pass before shots of a certain xG probability (or a version of xA did). xPass told you the likelihood of completing a pass based on available on-ball contextual information about the pass, and xG-chain, recognizing that shots are the result of successful possession chains, told you which players were involved in some shape or fashion in a possession that ended with a shot of a given xG. All of these innovations allowed for analysts to be better at predicting which players and teams might achieve various results (with varying levels of success) by basically “zapping” the data that felt like noise (while still doing this quite poorly in absolute terms). But without more information related to what was happening off the ball, these models were (are) severely limited in their ability to explain “how” or “why” those players/teams were more likely to achieve said results. And since team results are contingent on this how/why and the way players collaborate is only one of these many how/why questions, using xG for recruitment has its limits as well.
And earlier, I limited my description of xG’s ability to explain the bounciness merely to that of a “big chunk” because shooting isn’t the only thing about soccer that’s hard or variable! On an absolute scale, most shots fail, but the bounce of the ball spares no one. It’s hard to pass, hard to get on the end of passes, it’s hard to trap a ball, it’s hard to anticipate the movements of complex systems, and it’s hard to do all this while running really fast and for a long time when someone is trying to tackle you. On average, teams are able to find some level of performance when we measure it in shots-terms or xG terms, but in the short term, these types of possession results are noisy too. So to the extent that xG models identified this very specific variability in shot conversion, it was but a glimpse into a more comprehensive appreciation for chance in soccer that persists throughout all of its sequences.
As a side note, on my read a more comprehensive appreciation for chance in soccer is further obscured by the extreme levels of disparity in competitive (economic) resources between teams that regularly play each other in the most watched and most analyzed leagues in the world. Things seem less variable (but still variable) when half of your data is a certainly wealthier team competing with a certainly poorer team. If you’re standing at the halfway point of a season and trying to predict all the teams’ goal differences over the remainder of the season, it’s true that using expected goal difference will do better on average than past goal difference. That said, in absolute terms xGD is generally going to get you more mileage in the big 5 European leagues than it will in Major League Soccer in America because of MLS’ salary cap. When relatively even-strength teams are playing each other all the time there’s just way more variability in the chance creation totals than when uneven teams are regularly playing each other (before you even get to any variability in chance conversion). We generally think of the big 5 European leagues as exhibiting those qualities most representative of “real football” because the best players in the world plying their trade there, but if you reframe the question in terms of parity, whether a theoretical ideal of soccer is better exhibited in a contest between two evenly matched teams or in a mismatch, we’re less certain. We seem to like when two evenly matched teams go up against each other. It makes things a bit less predictable.
V. Soccer analytics contributed the unit of account
Alas, if soccer analytics didn’t directly contribute much to our understanding of “how” soccer works or “why” outside of contributing a clear affirmation of probability and the idea that shooting closer to the goal is good, at least there was a sort of auxiliary output that was produced that may ultimately have been worth more than everything else. Even though xG didn’t uncover the mysteries of everything that makes soccer soccer, its insights demanded a clear way to be communicated -- to be expressed in writing, and so the shorthand nomenclature of xG in tenths or hundredths of a goal was born.
This team is average 1.3 xG per game, this player is averaging 0.12 xG per shot, this chance ‘had an xG of’ 0.25. This player is creating 0.5 xG+xA per 90 minutes. This goal keeper is conceding 20% more than the post-shot xG of the shots he faces.
By articulating and scaling against the only stat in soccer that everyone understands (goals), by putting this model of shots and their conversion probabilities into goal units - this basic nomenclature of xG gave the world of soccer a quantitative language or unit of account that would (or will) prove vital for further work. If we borrow this xG language for a concept like “xG added,” (possession value models), we can now describe moments in a match in relation to the likelihood each team has of scoring over some horizon, or we can describe and evaluate actions teams/players have taken based on how these actions changed those existing probabilities — and we can do this all in relation to the way the scoreboard reads: in goal units. As Devin Pleuler said of xG, “it’s a shit metric, but it’s a really great framework.”
There is a real elegance in this where the aesthetics of xG as a unit of account make us more aware both of the uncertainty in it all (we are literally expressing our ideas about the game in terms of probabilities) , and also of what Bill James calls “the scale of the elements” of soccer — that if we care about (if we “value”) the score, then different actions, different accomplishments, and different scenarios have different quantitative impacts on the probability of the score changing. We can for instance explore how valuable winning a corner kick is (it’s not worth 0 or 1 goals, it’s somewhere in between). Or we can assign a value to a successful switch of play, or we can ponder the value of an “anywhere will do” clearance compared to the risks and rewards of starting a transition from one’s own box via a short pass. And further, it stands to reason that we could use these scaled goal-denominated probability values, even if abstractly (or with made up numbers), to discuss elements of a “theory of soccer” going forward as we zoom out and in and all around.
Further, I would add (and have written before) that the growing use of xG as a unit amongst soccer analysts gave soccer the missing ingredient that the rich (yet imperfect) financial accounting data has given the finance/investing/business world for centuries. With a currency or unit of account, we can now account for things and do so always in reference to the common measure of a “goal scored.” And we can do this carefully so long as we remember what accounting is and what it isn’t, that accounting isn’t some universal source of truth but instead a model full of its own rhetorical choices and political assumptions, and a means to an end. It poses questions not answers.
VI. Accounting & mystery
In December, Scott Ferguson and Maxximilian Seijo interviewed Paulo Quattrone, a professor of Accounting, Governance & Society at the University of Manchester for their podcast Money on the Left. Since I’m claiming the first wave of soccer analytics brought us a language or system of accounting with which we might explore soccer further, perhaps it’s worth reading some excerpts from this interview and reflecting on the nature of accounting as rhetoric and its origins (below):
Paulo: Accounting is interesting because it emerges and was designed as an instrument to seek for this wisdom and for this balance. It did that thanks to a lot of rhetorical techniques…. In the first accounting treatises–let’s say early, modern times, late medieval times–to explain to those who are reading these treatises what accounting was about, an example that was used is the metaphor of the mirror….
Mirroring in Latin is speculatus… The idea is you create a distance between you and yourself and you reflect on your behavior by looking at yourself in this mirror which, in accounting terms, is the financial reports that are produced at the end of the year, or when you close the books and you open them again. Interestingly, this “speculation” was a moment of reflection, a moment of reflecting on your morality.
So accounting is all about “speculating” about what you do not know. It’s about creating spaces in between opposites. In that sense, it’s rhetorical, to make sure that you interrogate the unknown. There is a link between Latin rhetoric and accounting as well. I mean, if you think of a couple of words like data, or fact, fact is possibly the most interesting, but data as well. Fact comes from factum, which means made. Data comes from datum, which means given, but also attributed. So the meaning of data is never given, it’s always attributed. The truth needs to be in the middle, in that middle space between the two opposites. Accounting is about creating two opposites in order to speculate on the mystery of value, in order to speculate on what is in between these dichotomies, expenses and revenues, assets and liabilities. You create figuratively in order to deal with the mystery of value, with uncertainty, with the unknown, so forth and so on.
…So it’s about making sure that you use what you can count–money–in order to reflect about what you cannot count, which is the purpose of the organization, your morality, what you need to do next year, and so forth, and so on. While we tend to, nowadays, reduce everything to numbers in the false belief that numbers will produce objectivity and generate rational choices, that was just the first movement that accounting did. Accounts indeed reduce the complexity of the world that is around you which cannot be reduced to numbers in order to augment your understanding of this complexity. However, numbers were excuses, they were not final objectives. They were means to explore the ambiguity of life. They were means to explore the mystery of value. They were means to explore how we always deal with uncertain situations. They were never instruments to eliminate the mystery, eliminate the uncertainty and eliminate the ambiguity. That would have been a stupid way of using numbers. Yet this is exactly what we’re doing in contemporary time. We are using accounting as if accounting can provide more certainty through answers, while instead, the only thing accounting can do is to point us towards the right questions.
I include this not so much as a warning about the importance of holding open multiple perspectives when we analyze data going forward — although let’s absolutely do that — but as this post draws to a close, mostly this is one more chance just to interrogate the problem that xG models addressed. The starting problem is again that when we look at the league table or the standings — at the results! — it is so easy to see this pre-existing “accounting” (the points and goal difference) as having already eliminated the uncertainty and the ambiguity around the soccer games upon which we might otherwise more deeply reflect. xG did not introduce data or statistics into the soccer discourse. These things were there as soon as we started keeping score and as soon as we started tallying up the results to determine a champion. And those things were good to do too (who’s to say). But xG reveals the extent to which the accounting of the final scores of the games does not solve the mystery of the game, and along these same lines, with a more data-rich set of accounts, neither does xG.
Forward: In search of a theory
In summary, any theory of soccer must wrangle with its inherent bounciness … it’s probabilistic nature, a nature that begins most obviously at shot conversion but extends pervasively throughout the entirety of the game in every possession, every sequence and every action. xG helped us to reckon with that. Further, xG contributed to soccer theory a language that incorporates this probabilistic essence into all of the parts of the game, this practice of referring to (or naming) actions or sequences or moments with values denominated in the probability of a goal being scored or conceded over some horizon.
To close, when we talk about “theories of soccer” and soccer analytics, it would be at great peril not to reference Marek Kwiatkowski’s enduring think-piece Towards a New Kind of Analytics. While acknowledging xG’s contributions, we’ve also highlighted its limitations in helping us to form a deeper theory of soccer. Without speaking for him, it sure seems Marek recognized this as well in 2016 and wrote for Statsbomb:
I think about football analytics as a bona fide scientific discipline: quantitative study of a particular class of complex systems. Put like this it is not fundamentally different from other sciences like biology or physics or linguistics. It is just much less mature. And in my view we have now reached a point where the entire discipline is held back by a key aspect of this immaturity: the lack of theoretical developments. Established scientific disciplines rely on abstract concepts to organise their discoveries and provide a language in which conjectures can be stated, arguments conducted and findings related to each other. We lack this kind of language for football analytics. We are doing biology without evolution; physics without calculus; linguistics without grammar. As a result, instead of building a coherent and ever-expanding body of knowledge, we collect isolated factoids.
While this “theory of soccer” exploration we’re attempting here might not have been exactly what Marek had in mind, his piece is one I return to often as I frame up this series. At the time, he hinted towards further analytics work within “possession chains” as a worthy focus, seeing these as a more appropriate fundamental building block for the game than the individual discrete actions comprising the chains. This was prescient and in the years to come possession value models would take off and reinvigorate the scene. We’ll return to the soccer analytics space again soon because I want to touch on and interrogate how the primary soccer analytics use case became recruitment rather than something more tactically driven, and as we pull together more strands of theory, I suspect we’ll return to “possession value models” at least once.
But if we start at this idea that a theoretical foundation was necessary (and lacking) for building towards further discoveries within soccer analysis (Marek was talking specifically about data analytics but with a gaze upon the sport itself surely), then we need to continue exploring other areas in addition to data analytics to find our footing in some sort of conceptual foundation that explores questions like “why” and “how.” It pains me to say, but I think we cannot avoid exploring one of the other areas of soccer analysis that was proliferating online at the same time as the early modern public soccer analytics movement .. something that, like analytics, sought to better explore, understand, and articulate the nature of the sport in the pursuit of on-field advantages. Reader, we might fail. I’m talking about …
*shudders* *ominous music* ~~Soccer Tactics Writing.
images from wombo.art
Where can I read Parts 4-9?