Soccer Data, Accounting, and the Origins of Money

Step 1: Find some data and attach a unit of value

In the last post, we walked through the five basic steps of projecting future cash flows for a business, with the aim of implementing those five basic steps for player assessment in the soccer operations department of a front office. The first step was to acquire reliable accounting records of the target’s historical business performance, and so if our GM wants to identify and sign players that will make his team better he needs to start by obtaining some historical records of the player’s performance. One way or another, when we’re talking about reviewing a player’s historical performance, we’re talking about data, whether it’s a video reel of his past matches, a stack of scouting reports, a detailed report of his player statistics, or something else.


Let’s start first with the most quantifiable category of data. At the 2019 MIT Sloan Analytics conference, Daryl Morey, GM of the Houston Rockets said this about data in soccer:

The reality is, it’s a very complex sport, 11 on 11, lots of free moving, not a lot of set things, and every time something happens you get zero so how are you concluding anything, everything leads to zero. You can do everything right, you get zero, you can do everything wrong you get zero. So it’s very hard to differentiate. In the NBA, you go back and forth 100 times and each time down you get a pretty good distribution of zero, one, two, three, lots of scoring, that allows us to differentiate things. I only listen to data when it really tells me something, and right now the sport isn’t there ….

Your data is shit, it doesn’t tell me anything

OK, so let’s just talk about soccer data for a minute.

Event Data

The most commonly available detailed match data in soccer is what is known as “event data” and it is captured/coded by data providers like Opta and Statsbomb. Event data basically means that every time the ball moves or is engaged with by a player a record is created with several fields of information captured. Perhaps the most important fields include the time stamp, the type of action (e.g. a pass, dribble, tackle), the location of the ball (including the start and end location for certain actions like passes), the type of pass if the action is a pass, the player acting on the ball, that player’s team, the opponent, the score of the game at that moment, the pattern of play, and the result of the action (e.g. failed or successful pass or tackle, shot on target, goal, shot off the woodwork etc).

Because each action is time-stamped and tagged to players on either team, with a modest amount of data organization we can understand the relationship that exists between any two or more events. We can order them chronologically, we can organize them into sequences and possessions, place them in their proper context, and we can plot them on the pitch, even animate them to visualize everything that happens on the ball in a given match. We can quickly know for every action that succeeds or fails who is on the ball, who touched it last and where and when, who touched it before that and where and when, etc. There are over a thousand recorded events in a match, and data is recorded for every match for all major leagues in the world and has been for some time. This is an incredibly rich and voluminous data set. The insights that are to be gained from mining this rich data set are immense (and have been proven so). It’s not the entirety of the game of soccer that’s recorded in the event data, but it’s a whole lot. As a frame of reference, it’s almost everything that the (admittedly inadequate) broadcast camera view shows the audience. All of those zeroes that Daryl refers to, the zeroes at the end of nearly every action— he’s not wrong, those are real. The scarcity of events attached to goal scoring is one of the true and beautiful essences of the sport. Soccer is hard. But ones and zeroes are also not the only ways to account for these data records.

Soccer is hard

Brian Phillips writing for Slate in 2011 (towards the beginning of the soccer data revolution) wrote about the emergence of big data providers in the sport:

So the notion of soccer as a kind of quaint, starry-eyed endeavor that can’t be explained by the numbers is a little outdated. There’s just one problem with the sport’s newfound sophistication, which is that soccer happens to be a quaint, starry-eyed endeavor that can’t be explained by the numbers. That kind of statement immediately marks one as a paleo-romantic philistine in Bill James’ America, but the paleo-romantics have a point when it comes to soccer. Because the complexity of the game is so enormous, reasonably thorough stat-tracking requires independent companies and expensive systems, systems whose own complexity introduces a further degree of uncertainty ….

Soccer isn’t baseball, in other words, and even leading researchers in soccer numerology doubt they’ll discover a universally applicable set of metrics—there are simply too many differences between teams and cultures, and too much chaotic complexity within the game itself. Sure, soccer has passes, shots, crosses, free kicks, and so on. But there are also long sequences of play when, say, a defender boots the ball forward, and two players jump to contest it in the air, and it sort of slouches off to one side, and the opposing right back (who’s stuck covering the midfield because his teammates are scattered out of position) gets to it first, and angles what looks like a long pass to the left winger, only the ball swerves in the air and is picked off by the other team’s goalkeeper, who rolls it back to the defender, who boots it forward, and so on. How do you account for that?

How do you account for that? I love his choice of words here. As you know, a sporting director running the soccer operations at a football club isn’t the only executive in the world forced to take rich yet incomplete data and use it to make predictions about and to be held accountable for things that will happen in the future. As we discussed in the last post, when a financial analyst obtains accounting records for a target company for the purpose of putting a price on acquiring the business, he’s dealing with data-rich yet incomplete historical records. The financial statements he uses as the foundation of his projections were compiled using all of the available events or “transactions” that a business is allowed to account for based on rules set up by regulatory bodies. Importantly, this is not inclusive of all activities that the business undertakes. The real world and the business world are complex. Certain things aren’t accounted for because either it would be overly cumbersome to do so, or overly judgmental in nature so as to make them unreliable for the purpose of widespread and credible use. When Phillips speaks of the complexity of the game requiring any attempt at comprehensive stat-tracking to be expensive and complex he is partly talking about the need for or the questions that arise from regulation. This is a problem that faces the business world and the answer (while not airtight) has been regulation and the establishment of clear rules and standards around the accounting, something that we also mostly have at this point in time in the soccer event data some nine years after Phillips’ reflections.

What financial accounting data has that soccer accounting data does not have

Now, there’s something very different between modern financial accounting data and modern soccer accounting data. First, it is true that they both include volumes and volumes of very specific and descriptive transactions or events. For example, one data record may show a pass from a fullback to defensive midfielder in the central third of the pitch in the 39th minute in Sheffield, following 6 other passes on a possession that started from a goal kick as the score was drawn 1-1, while another data record might be a journal entry in Sony’s General Ledger to record a customer invoice, which has been numbered serially in the order of its consummation in a batch of other invoices, and approved for 5 pallets of computer hardware to be delivered at the BestBuy midwest distribution facility A on November 2, 2020. But, there’s something special and incremental in the financial accounting data (in that Sony invoice) that’s not initially there in the soccer data, something we take for granted that has been added to it, and to invoices like it over thousands of years of recorded civilization. The addition is simply money as the unit of account. But it wasn’t always this way.

The origins of the “money unit of account”

Value denominated in units of account is not a natural phenomenon, neither in soccer nor in the “real world.” This is perhaps overly indulgent for the purposes of this post, but I think it will be illustrative of the critical point here and there is no better time to dip our toes into the history and origins of money as a unit of account than shortly after the unfortunate recent passing of David Graeber just last week. Graeber was an American professor at the London School of Economics, who wrote the seminal anthropological work on the history of money and debt: Debt: The First 5,000 Years.

In Graeber’s book, which I adore, he traces the origins of money through history not to barter (the more neoclassical line of logic), but to interpersonal credit and gift arrangements. The barter theory of money, which he roundly rejects goes something like… barter occurs in the spot market when a coincidence of wants emerges between two individuals, each having created or cultivated desired goods or services the other just happens to want. As individuals inefficiently match-make in these ways they exchange products and services as an overall economy and then slowly over time a dominant product which is itself inherently desirable, uniform, and easily divided becomes the dominant barter good or “currency,” and henceforth all barter transactions are replaced by the bartering of all products and services with the one dominant one, the currency or the money thing.

Instead Graeber describes in “Debt” the idea that Keynes, Knapp, Innes, Lerner, Mynsky, and many others explored which is that the barter theory of money is completely incorrect. That in fact money’s origins can be traced first to early pre-market, pre-money credit arrangements between individuals, whereby one individual needs a certain product now, but does not have anything to offer in exchange, is given the product and this gift is written down in the “seller’s” ledger as a debt. Importantly, there is no monetary unit of account for this debt in the ledger. It simply says “Joe took 10 bags of potatoes” or whatever the exchanged good was. Later on, the individual who produced the potatoes may need a new pair of shoes, and they might go to Joe, who is a cobbler and ate their potatoes last season, and the cobbler will oblige to deliver a new pair of shoes. In this example money still does not exist. We have transactions, with all of the information we need to understand what is transpiring (e.g. the nature of the good being ordered, the when and where and how many), but we do not yet have a universal monetary value prescribed to them. What Graeber and the others find, an evidence-based theory I find compelling, is that this monetary unit doesn’t come about in society until a powerful governing entity decrees it by force, by a monopoly on violence or by other means of authority. Essentially, the king or the warlord or the democratically elected representative body, or whoever needs to provision certain basic services for their “kingdom,” most immediately perhaps an army or law enforcement with which to maintain their power, and so they levy a tax (or fine) upon their subjects which can only be paid in the unit of account that they also simultaneously create and control: let’s call it the royal currency. The penalty for not paying the tax is something bad (e.g. imprisonment). Society suddenly scrambles to acquire this royal currency lest they be jailed or worse, but the only person who has any of it to give out is the king (or the state). So his subjects line up to provision their goods and services (e.g. their potatoes and their shoe-making, their soldiering) to the king’s service in exchange for the royal currency (which they will need to turn around and give back to him when the taxes are due), and suddenly the King has provisioned a “government” and along with it, his kingdom has a functioning “market” organized chiefly around this royal currency. As everyone necessarily must obtain the royal currency to pay the tax or fines or risk something bad, everyone desires the currency and and even desires to save some amount of it, pretty soon the royal currency becomes the monetary unit of account, the “Absolute Unit” if you will, with which all goods or services are priced in the market. Now when Joe needs a potato he pays some of the royal currency he earned by making some boots for the government’s army, themselves having earned some royal currency by serving in the army, and so on.

Good lord, what have I gotten us into? OK, so back to Brian Phillips and the soccer ball being passed around and deflecting and so on and “How should we account for this?” The key difference between financial accounting data and soccer accounting data is that financial accounting data has embedded within it a monetary unit of account, the currency that exchanges hands when goods and services are sold and invoiced, when employees are paid for their time, etc. The point of the money story above, aside from honoring Graeber this week was to remind us that this monetary unit of account isn’t any more natural occurring in economics than it might be in the sport of soccer. The “money unit of account” is necessarily an intrusion by the state (a helpful one at that depending on your politics), an infusion of authority and accounting into transactions that are already data-rich, but otherwise devoid of a common unit of value, and in this way they are very similar to the data recorded thousands of times per match in modern football: the soccer event data. And hence there is a solution.

Soccer data could use an equivalent non-monetary unit of account

If financial accounting data serves as such a solid foundation for valuing a business, and the point of this newsletter is that we can use this foundation by analogy for evaluating players, then if we’re going to use the soccer accounting records (i.e. all of a player’s actions on the pitch), we need to imbue a unit of account into these soccer event records.

Imagine a financial analyst, tasked with valuing a business and handed a mountain of accounting documents: every invoice, purchase order, payslip, bill of lading that the business transacted with over the last several years, but with all references to prices crossed out or redacted. If this analyst has all the time in the world to read through these documents, he’s going to learn an awful lot about the business he is valuing. He might end up the foremost expert in the world about this business, more knowledgeable about its inner-workings than the management team itself. But, if he is asked to put a value on the business as a whole (a dollar figure) he is completely fucked. Even if he had a computer and some fancy statistical programming skills at his disposal, say R or Python, with which to speed up the inventorying and synthesis of all of these accounting documents and he was able to produce compelling trend analyses and predictive models of the types of transactions that preceded other transactions, the seasonality of certain large quantities of orders from large customers, and keys to success in growing these sales over time, he would still dread it existentally if anyone asked him the question “so how much is this all worth?”

When asked to put a dollar value on this enterprise, without a unit of account attached to all of these accounting records, he remains hopeless. As Daryl Morey says, his data is shit.

This is exactly the sort of puzzle I’ve highlighted before that soccer analytics finds itself in. With data, you can recreate detailed histories of the facts that comprise past soccer matches. You can generate all of the analysis and insights in the world —passing networks, expected goals, keeper saving models, high pressure tendencies —but without a unit of account breathed into the records, you’re like the analyst rifling through mountains of dollar-less invoices and bills of lading while his bosses scream at him to come up with an offer so they can try to buy the target business. And that’s just the beginning, the issue for the data analyst — for the quantifiable side of data. Remember, we would include broadly in the definition of “data” not only the Opta data files, but all of the information: the scouting reports, video reels, etc.

Qualitative data also needs a unit of account

For scouts and performance analysts, instead of staring at a mountain of non-denominated accounting records, they have a richer but smaller data set to work with. By analogy they’re able to observe and experience the inner-workings of the target business itself, how it sells to its customers, how it buys from vendors, the types of trade secrets it develops in its R&D department, but because there is only so much time in the day, the scouts and performance analyst equivalents can only observe these in short bursts, site visits to factories and distribution centers, dial-ins to board meetings so to speak (one-pagers and video clips in Wyscout), and they still have the same problem the data analyst has, which is they can observe all of these processes they want, and interview key stakeholders, but if someone asks them how much the target company is worth (or how many goals a midfielder is worth), without a unit of account attached to what it is they’re observing, they too are up shit creek.

I am not here to pull the soul out of the sport. Soccer has an aesthetic and cultural value beyond its competitive attributes. In the same way that potatoes haven nutritional value, kicking or trapping a soccer ball has a certain inherent joy to it. But in a modern football club, the sporting director is tasked with achieving and optimizing results on the pitch within the constraints of a fixed economic budget. Because winning is important, similar to money as a creature of the state, the soccer unit of account has already been injected into the game, like it or not - by competition and by the capitalistic rewards that come from consistently achieving competitive goals. You know by now that this unit of account is marginal expected goal difference.

Whether it’s the data analyst or the scout or the performance analyst, the unit of account is there. It must be declared, bequeathed, infused into the soccer specific data the soccer operations department uses to make high level predictions about the future and accordingly to make the critical roster budget allocation decisions in the present.

Next Up: Adding Goals

OK, let’s breath some units into these soccer transactions. Let’s account for them properly. We’re going to need some help from some smart people. In the next post, we’re going to dive into something that was hinted at back when the GM, the scout, and the data analyst were discussing a potential player signing back in the earlier episodes of this saga. We need a sensible way to assign a unit of value to each and every record of the thousand plus records that are logged in a soccer match, and as you already know from the other posts, in an organization tasked with winning games, the unit of value must be “expected marginal goal difference” if our GM or Sporting Director is going to be able to seamlessly integrate it into his decision making process. For every action a player takes on the pitch (and for every data record captured), what is the contribution to the team’s expected goal difference? In the next post, we will add the goal difference unit of account into the already rich soccer accounting data, putting us on solid ground to make projections about future player contribution to team goal difference by starting with historical records of player contribution to team goal difference. Onward then. Thank you for subscribing and sharing.

Share Absolute Unit

Appendix: Tracking Data

With event data, we have nearly everything we want to know about what happens on the ball, and of course what we’re missing is everything that happens off the ball: if one or two players are involved in a given action at a time (and tagged as such in the data), where are the other 20 or 21 players on the field, and where were they located just a split second ago, and the split second before that, how fast are they going at this very moment and in what direction? What is going on with the most ubiquitous and yet most sought after of elements in a soccer match, the space between the players and between the goals?

The solution to these questions is known as “tracking data” which uses high tech cameras to capture the locations of all players on the pitch and the ball several times per second to record a more complete accounting record of every moment in a soccer match. At present, this data is not universally available, especially publicly, and so this newsletter will continue to use “event data” as its proxy for the quantifiable accounting data needed to form the foundation of good player contribution projections. But you can imagine how much more insightful this type of information is (possibly even information overload at present), and the concepts that this blog walks through should apply to a future state where tracking data, not event data is the most granular level of accounting data available to all soccer clubs.