Wednesday, 13 March 2013

Age Weighting In The EPL, Sorted By Position and Other Musings.

A couple of days ago I wrote up a post on the age of Premiership players, most notably, Ryan Giggs. During the number crunching I worked out the weighted age by actual minutes on the field of each Premiership team for the current season and in addition I broke the numbers down for midfielders, strikers and defenders. The exercise is reasonably interesting because it paints a rough picture of where individual teams are playing with youth or experience, either through accident or design. It's unlikely that Villa's defence and QPR's attack had much in common to talk about when the teams met back in December.

Out of idle curiosity I then looked to see if there was a correlation between the weighted age of each group of playing positions and another. In short do some team units age gracefully together or get torn down an rebuilt in step.

Average Age of Premiership Positions, Weighted By Actual Playing Time. 2012-13. 

Team. Weighted Age of Defence. Weighted Age of Midfield. Weighted Age of Attack.
Arsenal. 26.5 25.4 25.3
Aston Villa. 23.1 25.3 23.6
Chelsea. 27.5 24.5 26.0
Everton. 30.5 28.8 25.8
Fulham. 30.6 29.8 29.4
Liverpool. 27.2 24.6 25.0
Manchester City. 27.4 26.8 25.7
Manchester United. 27.8 28.5 26.5
Newcastle. 26.6 25.7 26.8
Norwich. 27.0 25.8 27.5
QPR. 28.6 26.8 30.7
Reading. 27.6 28.2 28.4
Southampton. 24.3 25.4 24.8
Stoke. 26.5 27.8 29.6
Sunderland. 25.4 25.4 26.6
Swansea. 26.3 26.0 27.2
Tottenham. 25.4 25.7 28.0
WBA. 29.1 27.0 26.2
WHU. 27.4 27.6 25.9
Wigan. 30.0 25.8 26.7

I didn't expect to see a correlation, so I was surprised to get r^2 values of around 0.2 and when I randomly jumbled up one set of averages the correlation was almost always close to zero. So there did seem to be a link, even though it made little intuitive sense.

If you look at the average, weighted overall age of all teams, two stand out. Fulham are particularly old, with an average weighted playing age of nearly 31 and Villa are unusually young. These two outliers almost guarantee that the average age of each positional group within their team will be very close. For example, for Fulham to have say a group of midfielders with a weighted average of 28, to maintain their overall average they would need strikers and defenders to average 33 years old of playing time. It is much more likely that at the extremes the weighted ages of the three separate groups will be similar and this proves to be the case for both Fulham and Villa.

However, the strong correlation between the weighted age of positional groups seen at Fulham and to a lesser degree at Villa may be almost totally responsible for the mild correlation seen when we plot the group of EPL teams as a whole. And that appears to be what has happened. If we remove Fulham and Villa from the regression, the r^2 value drops to nearly zero indicating that there is really no general connection between the ages of midfielders and strikers or strikers and defenders in the bulk of EPL clubs. A spurious post narrowly averted.

A more mainstream example of this (badly) explained phenomenon occurs with the apparently strong connection between possession and success. If you plot seasonal possession for the EPL against a success based metric such as points or shooting accuracy, you get a reasonably straight line with a healthy enough r^2 with good teams at one end and bad ones seemingly at the other. So the idea that possession is a good thing, essential for success appears to be confirmed.

I've made a few posts proposing that possession is a meaningless stat, therefore, I should try to explain this apparent contradiction. If you remove the big four teams, who invariably make possession count and repeat the regression, then the correlation virtually disappears. Just as Fulham and Villa currently drive an apparent, but bogus league wide aged based correlation, Manchester United and company do the same for possession relating to success over a season in the EPL. The correlation is strong for the big four, but almost non existent for the rest of the league.

The same is seen in Spain, remove the big two and possession doesn't correlate to success in Spain, put them back into the sample and you have r^2 evidence that it does.

1 comment:

  1. what you are saying is that you have a covariance problem? i'm no statistician, but surely removing data points is not an accepted methodological approach. would it not be better to include the other variables that explain the covariance?