A couple of football analytics' little obsessions are correlations and extrapolations.
Many player metrics have been deemed flawed because they fail to correlate from one season to the next, but there are probably good reasons why the diminished sample sizes available for individuals lead to poor season on season correlation.
Simple random variation, players suffer injury, a change in team mates or role within a club, atypically small sample sizes often lead to see sawing rate measurements and inevitably players age and so can be on a very different career trajectory to others within the sample.
The problems associated with neglecting the age profile of a group of players when attempting to identify trends for use in future projections is easily demonstrated by looking at the playing time (as a proxy for ability) enjoyed by players who were predominated aged 20 and 30 when members of a Premier League squad and how that time altered in their 21st and 31st years.
The 30 year oldies played Premier League minutes equivalent to 15 full matches, falling to 12 matches in their 31st year. So they were still valued enough to play fairly regularly, but perhaps due to the onset of decline in their abilities they featured, on average, less than they had done.
The reverse, as you may expected was true for the younger players. They won the equivalent of seven full games in their 20th year and nine the following season.
It seems clear that if you want to project a player's abilities from one season to the next and playing time provides a decent talent proxy, you should expect improvement from the youngster and decline from the older pro.
However, as with many such problems, we might be guilty of attempting to impose a linear relationship onto a population that is much better defined by a distribution of possible outcomes.
The table above shows the range of minutes played by 21 and 31 year olds who had played 450 minutes or fewer in the previous season as 20 or 30 year old players.
As before, we may describe the change in playing time as an average. In this subset, the older players play very slightly more than they did as 30 year olds, the equivalent of two games, improving to 2.2.
The younger players jump from 1.8 games to 3.6.
However, just as cumulative xG figures can hide very different distributions, particularly of big chances which subtly alter our expectation for different teams, the distribution of playing minutes that comprise the average change of playing time can be both heavily skewed and vary between the two groups.
Over three quarters of 30 year old didn't get on the field at all during the next Premier League season, likewise 2/3 of the younger ones..
21% of young players played a similar amount of time to the previous season, between one and 450 minutes, compared to just 14% of the older ones. And 17% of youngsters exceeded the total from the previous season, as did just 10% of the veterans.
So if you use the baseline rate of increased playing time as a flat rate across all players that fall into these two categories in the future, you might be slightly disappointed, because overwhelmingly the experience of such players is one where they fail to play even a minute in the following season.
Knowing that there is an upside, on average for these two groups of players, based on historical precedent is a start, but knowing that 3 out of 4 the oldies and 2 out of 3 youngsters who you are considering didn't merit one minutes worth of play in an historical sample is also a fairly important, if not overriding input.
No comments:
Post a Comment