The baseball movement grew through better access and quality of raw data,development of both descriptive and predictive metrics that replaced tried and relied upon older ones and a proper understanding of the limitations and pitfalls that come with use of such data.At the moment football's approach is attempting to tackle the former,is in danger of being flooded by a wave of new statistics,but is largely ignoring the limitations of it's newfound knowledge.

Fortunately football's current weakness is very much baseball's previous strength.Issues of sample size and the reliability of the conclusions we can draw from different population sizes is the bedrock of much of the advanced baseball analysis.Regression towards the mean is a powerful and necessary component of any advanced analysis and much of the foot slogging has already been done.

The nature of baseball overall is very different to the fluidity of football,but football does have set piece,success or failure events that can be incorporated into baseball's approach to evaluating the role of skill and luck.The interplay of luck and talent is easy to visualize,the maths less so and rather tedious.Therefore,I'll concentrate on the former and skip over the latter,much of which can be found on any Sabermetric blog.

A player or team's recorded performance over a period of time will be a combination of skill and luck and in smaller samples the random element will predominate,while in larger ones it is talent that will begin to shine through.A conversion rate of one goal from three shots is very impressive in the context of one game,but if that is the only piece of information we have about a player or team,what does it tell us about how he will perform in the future.

That answer will depend on a variety of factors.How a player performs at a particular statistics is valuable information that increases in reliability as we increase our sample size,but we must also include how the population as a whole performs.There are many more average performers than truly outstanding or poor ones,so if we have limited information about a player our overall projections will be more accurate if we assume he is more likely to be average than extremely good or bad.

How deeply we regress an individual observation towards the population mean is also dependent upon the reliability of the observed parameter.If the spread of talent within the overall population is small,then an extreme observed value for an individual is more likely to be down to the intervention of random chance,therefore more weight should be given to the population's average than to individual scores.

Below I've shown the typical amounts by which a team's shooting efficiency statistics should be regressed towards those of the population mean for data taken from six seasons of the EPL upto and including 2010/11.

__Amount of Regression to the Mean Needed to be Applied to EPL Data,2005-2011.__Team. | Number of Shots. | Number of Goals. | Raw Efficiency. | Adjusted Efficiency. | Regression Rate. |

Arsenal. | 3238 | 428 | 13.2 | 13.0 | 8.5% |

Man City. | 2618 | 308 | 11.8 | 11.7 | 10.4% |

Stoke City. | 1051 | 118 | 11.2 | 11.2 | 22.3% |

Burnley. | 406 | 42 | 10.3 | 10.7 | 42.7% |

29 teams played at least one season in the EPL over the period and they combined to produce over 53,000 shots and almost 6,000 goals.Arsenal of course were present for all six seasons and accounted for 3238 shots and 428 goals for a strike rate of 13.2%.An average team attempting that many shots would only have expected to score 360 goals,so the Gunners outscored the average by almost 70 goals.Their individual observed rate was in excess of 3 standard deviations above the average for the group of teams,so they were extreme outliers.However,they did record that figure over a substantial number of trials and so their individually observed rate is likely to be a credible record of their true ability at converting shots into goals.Once all the maths has shaken out only 8.5% of the group's average conversion rate of 11.1% is combined with 91.5% of Arsenal's actual rate recorded over 3238 trials.This has the effect of dragging Arsenal's observed rate over the 6 years slightly towards the league's average.

By contrast Burnley spent just on season in the top flight and they only managed 406 shots on goal and that record is more likely to contain random noise than was Arsenal's more numerous body of work.In predicting a representative conversion rate for Burnley during their brief stay in the EPL it is better to add a larger portion of the league's average conversion rate.In this case a 42.7/57.3 split of league average to observed rate,which pushes Burnley's observed rate up towards the 11.1% average.

If we regress all observed rates for all teams in this way the overall predictive quality of the new rates on average will be better than merely using the actual observed rates.A small but worthwhile improvement.

An extremely useful by product of the process of regressing observations towards the population mean is the requirement to calculate the contribution made by luck and talent to the variable as sample sizes increase.After a small number of trials luck predominates and as we move upwards talent gains the upper hand in defining the size of the observations.It's possible,if algebraically tedious to be able to fairly accurately calculate the point at which the two variables are equal and below I've charted the number of various attempts required by a team for this position of luck/skill parity to exist.

__How Many Observations Are Needed Before Skill Starts To Shine Through.__Team Skill. | Number of Attempts. | Average Number of Games. |

Goals per Shot. | 300 | 24 |

Goals per Shot on Target. | 190 | 29 |

Shots on Target per Shot. | 390 | 30 |

It appears that an EPL team needs around 300 shots before their goal haul is an equal product of talent and luck,in more understandable terms that's about 24 games.(Most goals based models for predicting future outcomes use at least this number of matches).Shooting accuracy reaches parity after 390 attempts,roughly 30 games and you'll need to watch a similar amount of games before you can conclude that a team's conversion rate from on target shots is thereafter down more to skill than chance.

Of much more interest from a scouting perspective is knowing when an individual player's performance begins to be driven by his talent rather than an unequal sharing with good or bad fortune.Branding a player as poor because of insufficiently small sample size (think The Beatles) is just as bad as purchasing a "superstar" riding short term luck who later turns out to be mediocre.From data limited to scorers from the last three EPL seasons I get a figure of around 45 shots before skill begins to overtake luck in the quest for goals and that's likely to be in excess of 22 games for the average goalscorer.Perhaps pertinently Wenger is reputed to watch a player over at least 30 matches before he commits to buying or not,so we certainly seem to be in the right ball park figure in terms of teasing skill from randomness.

Regression Towards the Mean,the single most important aspect of football analysis...............

nice posting.. thanks for sharing.

ReplyDelete