Projections Deep Dive: Team Effects & Aging Curves
Working through the first two layers of systematically projecting player contribution ability
Previously On “Absolute Unit”
At a high level, the financial world estimates the “value” of an asset as (i) the projected future cash flows it generates, which are then discounted, to price it relative to the amount, risk, and timing of similar assets' cash flows using (ii) the concept of a “required rate of return.” To (i) project future cash flows the basic steps we agreed on were:
Obtain relevant and reliable accounting records of historical results, which are (1a) denominated in the unit of value that you care about (in our case “marginal goal difference contribution”).
Adjust the historical results for noisy, non-recurring items to create “pro-forma historical results.”
Take those pro-forma historical results and project them forward into the future in some kind of documented, evidenced-based manner based on the information publicly available.
Layer in any proprietary information you may have related to the thing being valued.
Apply individual judgments not already included above.
In last week’s post, we walked through a high level outline of steps 3-5 above and how a structured process might look. We decided it might look something like this:
Two Overall Phases: First, project out a player’s contribution ability (measured in marginal expected goal difference per 90 minutes, or per 100 passes/possessions etc), and second, project out a player’s opportunities to contribute (e.g. minutes played or number of possessions).
To expand the Contribution Ability phase, we decided it includes several layers including: 1) un-levering old team effects, 2) systematically projecting player growth, 3) Re-levering new team effects. Together these layers get us through the high level Steps 3 & 4 of making projections in the financial world (“project systematically” and “add proprietary info”) mapped over to projecting player contribution to our team.
There are several problems that arise in building a workable and repeatable process for these “contribution ability” layers, and that’s what this post is about. We’ll cover the first couple of layers and save “re-levering your own team effects” for next week.
Un-levering Team Effects
The first layer to peel back of the already pro-forma adjusted historical player contribution accounting records is the impact to the player’s past results that their specific team had that created variance away from some hypothetical contribution the player may have had on an average team. It is this unlevered team agnostic contribution that we want to take and project forward before considering our own proprietary knowledge of our own team and its play style and and its game model etc etc.
I just honestly don’t exactly know how to do this, and the public literature on it is sparse as best I can tell. The analytics community seems very wary of team effects. They are oft-mentioned in various capacities and connected to various topics, but I haven’t seen thorough explanation of the mechanics as it might relate to a complicated “all-in” bottoms up model like an EPV model (g+, VAEP, PV+ etc). And really all I would steal as a placeholder here is some rule of thumb by position (e.g. a fullback’s g+ contribution is roughly 65% individual contribution, and 35% team effects, a forward’s is 75%-25%), which I also don’t see any sort of consensus on. From an analytics perspective, the closest thing is this very nice whitepaper from Jan Van Haaren and Lotte Bransen that measures “chemistry” between pairs of players building off of their EPV VAEP framework and hints at future explorations to come. I anxiously await more work in this area while suspecting that behind closed doors, “smart” clubs have done plenty of work here to the point of having a working hypothesis at the very least.
All I can do is sort of stress the benefits of this overall process, the right order of operations, and clear documentation of the steps taken here as a way to help a decision maker peek into where there might be additional risk in the numbers. Further, back to some of the basic tenets of the blog, analytics is not the only way here. My personal politics are such that in a physical universal with fallible mortal beings, it helps to have all the information you can (and this very much includes all of the data stuff), but let’s say if a club had no analytics department but the smartest possible infallible football minds and the cleanest process for translating these insights into efficient decisions, theoretically this could work too, and perhaps you may be a club with such god-like scouts and analysts and you may have a great “rule of thumb” solution in place for this very problem. Perhaps it has to do with some more structured, rigorous way that scouting reports are populated, completed, and reviewed, some sort of mental model impressed upon the staff.
Again whether with or without this data-focused side of things, the risk of not stepping through the overall “Absolute Unit” logic in this way (or in the analytics space, automating it as a first cut before applying other judgments), is simply that the natural value of the insights your staff is generating (whether they are good or bad insights) will be leaky as they are passed up the chain to decision makers and integrated into a holistic roster build concept. The “scale of the elements of the game” may be lost.
For the purposes of continuing through our layers conceptually, perhaps picture taking a central midfielder’s past three year historical marginal goal contribution per 90 minutes of 0.05, 0.06, and 0.06 respectively, and now picture assessing whether his supporting cast and team strength, tactics, and structure were such that they limited his ability to progress the ball cleanly and accrue marginal goal difference contribution over and above what he might otherwise achieve on an average team — perhaps he is a young player with decent ball skills but is trapped on a perennially relegation threatened side who sits deep and only looks to play direct on the counter. Perhaps your director of analytics research (or someone else) has put in place a rule of thumb that for a central midfielder, 75% of a player’s historical contribution can be thought of as independent of the team, and 25% team-effected. You might, to complete the first layer of this projection process, adjust 25% of the player’s contribution up by some amount to reflect what his contributions might look like on an average team. Do some algebra or something! You might do this in an automated fashion and/or with individual judgments, but showing both in a “walk” chart, similar to what was previewed last week is a useful output in whatever package a decision maker sees.
Converting between leagues
This bit is controversial. From what I gather various analysts have various opinions on whether you can with any reasonable precision estimate conversion factors between leagues. But suffice it to say, different leagues have different levels of talent, which means that a player’s historical contribution to his team’s expected goal difference is subject to impacts both by his own teammates, see the above, but also his opponents. To the extent that both his teammates and his opponents have abilities talents (or even playing styles) that differ dramatically from his soon-to-be new teammates and his soon-to-be new opponents, that is to say, the extent that the level of competition in his old league differs from that of his new league, there may be an impact to the disparity between his historical contributions to team goal difference and his future contributions to team goal difference at the rate level (remember, we’ll adjust for opportunities later). The devil may well be in the details here.
If the sport in general is prone to inadequate sample sizes of things, surely cutting down that sample size to evaluate the before and after contributions of players transferring between leagues without picking up confounding other variables is difficult. Again, I’m not someone who works in the sport, neither on the data side of things, nor the scouting, so I’ll leave the implementation of this up to the experts. For my money, it may well be safer to ignore league conversions for transfers between most leagues, but it may well be critical to adjust something when evaluating leaps between leagues that are indisputably of different levels of strength. Pains me to say, but I’m looking at you, Major League Soccer.
From what I can observe publicly, Dan Altman at SmarterScout has a league strength conversion model that he trusts - and it impacts all of his player ratings. Others - at Statsbomb for instance - appear more skeptical of the idea of converting goals between leagues, but perhaps they are more confident about conversion at more granular level, hard to say. And behind closed doors at the “smart” clubs, who really knows? Nonetheless, like the “team effects” topic above, I think it’s helpful to layer this step into your production process, even if the adjustment value is immaterial between most leagues due to the level of uncertainty involved in the modelling.
Age Curves & Projecting Growth
So if we’ve taken the above step(s) to team adjust and league adjust our target player’s historical contributions, such that we’re staring at a team agnostic version of his past performances, the next step is to recognize that he is a developing player, with an age and an experience level that may differ from other targets we’re considering, or from other players on our current roster. We care ultimately about his future potential performance and contribution to our team’s goal difference, not his past performance, so if we’re to rely on his historical performances (adjusted) as a base, a good unbiased first step at projecting the future would be to plot him on some sort of aging/development curve and then ride the curve up or down or around from his adjusted historical performance through the offered contract period. Remember, we’re still focused for now on projecting how his “rate of contribution” will improve or decline, not his opportunities to contribute themselves, something I reiterate here only because a lot of aging curve literature involves charting an average player’s contribution over his career as measured in minutes. We really want to know how an average player’s marginal goal difference contribution per 90 (or per 100 passes/minutes in possession etc) improves or declines over his career. We’re going to tackle the minutes piece in a later phase. And we also want to stay in the confines of whatever pro-forma adjustments we made in Step 2 (e.g. if we quarantined all actions that that weren’t from open play, we should focus our aging curve away from dead ball contributions as well).
This one I also leave up to the experts to tackle, but I’m much more optimistic that there are readily available solutions out there that are up and running already in clubs or otherwise. A brief tour through some of the existing literature finds tons of stuff on aging curves from some of the more recognizable public analysts over the last decade. You have early Michael Caley stuff at SBN, Statsbomb weighing in a handful of times, Dan alluding to aging profiles in the SmarterScout platform, notes on the implementation of an aging curve into the beloved top-down plus/minus(ish) Goal Impact Metric, several entries from 21st Club, from John Muller and ASA when discussing g+ with Ryan O’Hanlon, and many more that I just didn’t really do a good job of curating here…
Note Bene: I’m so bad at this element of things, the inventorying and citation of past work in these various areas. I think of this blog as adjacent to football analytics not within it, so to the extent I leave someone out whenever I’m referencing this stuff, no harm intended. And feel free to shout stuff out that I’m missing.
Since our historical accounting records are powered by EPV calculations or at least concepts, what we’re really after — and what I suspect is the answer — is some sort of aging curve that specifically works within an EPV framework. There is one such article I’m aware of that explores this in detail with the g+ model for MLS, which is Zach Beery’s Goals Added MLS: Age Impact article. I really like it, and it seems like there’s something there, though I don’t think this is the complete solution to the aging curve a team would want to implement in the Absolute Unit framework. Zach runs into and articulates well (and he’s not the first) several issues with building an aging curve in MLS data, namely that “Designated Players” (whereby each team is only allowed to break the otherwise stringent salary cap rules for 3 players, having “designated” them on their roster) are often the most impactful players in the league, and often at the tail end of their careers, having played out their primes in Europe (hello league conversion factors!). Most aging curves have to overcome a survivor bias problem, where as players abilities decline, they fall out of the data set altogether and therefore their true declines are not captured in the curve, but MLS has this additional buffing of the aging DPs further exacerbating the survivor bias problem. At any rate, give that one a read, because I suspect throwing all of the world football data at an EPV model can show us something neat about player aging curves and how they might differ by playing style or by action type.
Further, Zach finds that the aging/development curve of players varies by type of action they’re performing. Using the “Goals Added” framework’s default 6 action types of “passing, dribbling, receiving, shooting, fouling, and interrupting,” he’s attempted to chart out different curves for each action, and I think this is an important aspect of any solution in this area. If you have an EPV model like g+ I would suggest the possibilities are nearly limitless as to how you might slice up the data into different action types or better yet groups of actions that create play-styles and construct aging curves of your own. Building more disaggregated aging curves allow you to make more disaggregated rate-level growth projections by category of things that might be more or less important for your team’s game model (coming in the next post). Further you might find that based on history, certain action types converge more or less to league average faster or slower than others, that some are noisy (remember Step 2) and revert to the mean more than others. Projecting in different categories allows you to apply evidence-based floors and ceilings to both the growth of a given ability in a given year and also the absolute levels of contributions based on past observations (i.e. even though a player is young and excelling at a certain action type or play style, we might have historical evidence to suggest that it is obscenely rare to see growth beyond their current level, or on the other end of the spectrum, we have an aging player who is contributing nothing or negatively to his team in a certain area, but while he declines we wouldn’t expect him to decline much in this area given he’s already at observable historical lows). Importantly, despite the projected actions being disaggregated in this way, they still maintain the universal “marginal goal difference contribution” unit of account and so conceptually we can sum all of the projections together as the analysis summarizes towards the end of the process into a decision maker’s hands.
This type of capping floors and ceilings to converge to industry standards is something that you see valuation professionals in the financial world apply all the time. A soundbite that I always remember is that in the short term your projections might mirror the current 5 year plan provided by management, but as your projection horizon approaches infinity (and it does when you are valuing a corporation since they theoretically live forever, you cannot project long term growth at rates that exceed GDP unless your projection is that the company literally takes over the world and becomes a globally governing corporation, the type you see in the cyberpunk novels — what was I talking about? Mostly, just remember that because we’re taking the structurally important step of not just looking at a player’s historical metrics, but trying to project their future contributions to a team, we need caps and other various safeguards in place to make sure that when we take those historical results and project forward in a systematic or even automated fashion, whatever math we’re applying isn’t creating weird artifacts simply from applying this step to such an extent that we’d be better off just not.
A Note on Startups and Youth Prospects
This idea of starting with historical results and projecting them forward with a clear and consistent method works pretty well when there are relevant and reliable accounting records for the taking. The problem is when you’re looking at a company whose historical accounting records don’t look anything like their planned future results, and the clearest example of this is a startup company. A brand new company burns through cash and incurs expenses with little regard for short term profits. They’re not trying to make money, they’re trying to make something no one else has made so that they can later make money, or they’re ramping up their resources before they even go to market to start selling something. So, if you’re trying to value a startup and you take their accounting records and then project them into the future, you’re going to be catastrophically wrong on average because you’ll always project them losing more and more money (though sometimes you’ll be right! Hello, Elizabeth Holmes).
Similarly, if you’re trying to evaluate the recruitment of a 16 year old academy product who played 45 minutes twice last year in the League Cup, projecting forward from his historical contributions per 90 will be equally futile. That is to say our go-to step by step process is defeated here, but the analogy between financial valuation and player evaluation is not. When valuing a startup company with no revenue, you’re looking for a viable, scalable, business idea. You’re trying to identify indicators that the revenues will in time come to fruition. You want to understand the basic concepts in place, whether they are unique enough to differentiate the business from its competitors at a sustainable margin, and some underlying metrics like maybe subscriber counts and subscriber growth that suggest that when the monetization is turned on, the dollars will arrive - the spice will flow. It’s a good analogy to how in a player recruitment process when data is truly scarce, the data analyst probably takes a bit of a backseat, except to otherwise warn about the lack of data. Just like the consultants that swarm the startup’s board rooms hoping to understand the “value chain” and the long term vision, and the internal metrics, the scouts who may have a more intimate knowledge of the intrinsic traits and abilities of a target player, should perhaps drive things more. That’s not to say there isn’t a role for data analysis here. They should help focus the scout’s efforts in certain skills that have been observed to show up and precede past excellent careers, but if there’s really no accounting records to go off of, the typical process has to be short-cutted in this way, and then importantly, shown separately and properly risk-adjusted to reflect the increased projection risk - something we’ll have to loop back around to in a later post.
Related, I have seen it stated that with such a paucity of data for youth prospects, often the most influential data point a recruitment team should consider is the presence of first team minutes (period) at a young age (almost, irrespective of the contribution figures themselves although there is a natural connection between contribution and opportunity). The thing is, when you see a teenager getting significant first team minutes in a top league, they are almost always prohibitively expensive to acquire and I think the reason for this very much aligns with the concepts here in Absolute Unit. A rule of the thumb I’m thinking of would be that if you have enough historical data such that you are able to confidently plot a teenager on your aging curve(s), you can also expect the ultimate output of this is that he will contribute to the team in a significant way (and thus he will often be too expensive to acquire if he’s currently on another team). Because the basic value proposition for investing in youth is largely one of cost saving (that they demand lower wages and that by accepting risk you can also pay less in transfer fees), there is a danger in preaching “younger is better” without regard for the original reason why such investment became sexy to begin with. A Sporting Director is concerned first with building a team to meet the competitive objectives of the club given the economic constraints that have been placed on him. If he wants to make his team younger, he should do it for the right reasons of cost effectiveness and some concepts around portfolio theory (to come later), not on the merits of youth investment as and end unto itself. Mark Thompson discussed this at one point in his wonderful newsletter Get Goalside.
But I’m getting ahead of myself. It is in a later post, where we will ultimately take these marginal goal difference valuations and convert them into the club’s allocated wage/transfer budget currency, and there will be plenty of time to litigate some of these ideas.
If I could make re-emphasize this one last takeaway here it would be that all too often you see analysis of the age of a team’s roster as an end unto itself, or you see the gnashing of teeth when a big club signs a player in his late 20’s or early 30’s just because he’s old. Aging curves are cited as damning evidence against the signing of older players, and team age maps are cited to show where a team might be “in real trouble” or “set up for the future.” I think the true value of understanding football’s aging curve is actually plugging it into the projections process, to value a player contract in terms of marginal goal difference contribution, which will ultimately connect it into the budgeting process. There is no need to decry the signing of older players just because their best years are behind them if they will still contribute above average returns (for the money) to your team’s expected goal difference. The reason we care about age is because we care about projections and budget allocation, not because of transfer fees, a point I will return to — ready to fight someone — at a later date.
Thanks for making it all the way through if you did. You can tell that at this point, part of what I’m doing is walking through a structured process and punting to you smart people to actually generate the insights needed to implement the thing. Hopefully this is still helpful to think through.
Substitution Effects:
Late edit (Jan 2014): I’ve just read Michael Caley’s Study on Sub Effects, and it strikes me that this is one of the layers sitting in a player’s historical records (and thus a contribution rate) that you need to unwind before you can move forward with projections. That is to say, a player’s historical contributions may be a function not only of his underlying performance but specifically of the amount of time he’s spent in substitution scenarios compared to starter scenarios: See Caley’s post here:
Next Up
So we’ve walked through thoughts on 1) un-levering both the team and league effects baked into a player’s historical accounting records, and 2) setting up a kind of automated or systematic first cut at projecting these team-agnostic player results into the future using aging curve ideas and other sensible projection techniques. Next up the proprietary knowledge that we have as the player recruitment department and our collaboration with the coaching staff and players at our own club needs to be taken into account as it relates to how the target player’s projected team-agnostic contributions might interact with our team’s overall game model, player roles, and team performance model. This is another really difficult one, but I think it will be rewarding enough for us to explore it in further detail.