Back in the late 90's, on Usenet and a floppy disk enabled IBM Thinkpad, beyond the reach of the even the
Wayback Machine, football stats were taking a giant lurch forward.
Where there had once been only goals, shots and saves had arrived.
Now, not only could we build multiple regression models where many of the predictor variables were highly correlated, we could also look at conversion rates for strikers and save rates for keepers.
In an era when the short term ruled, a keeper such as David Seaman was world class in terms of save percentage until he conceded more goals in a game at Derby than he had conceded in the previous five Premier League games. And then, as now, MotD's expecterts went to town on a singular performance.
Sample size and random variation wasn't high on the list of football related topics in 1997, but it was apparent to some that what you saw in terms of save percentage might not be what you'd get in the future.
You needed a bigger sample size of shots faced by a keeper and you also needed to regress that rate towards the league average.
This didn't turn save percentage into a killer stat, but it did make the curse of the streaky 90% save percentage more understandable when it inevitably tanked to more mundane levels.
|
Spot the interloper from the future. |
Fast forward and now model based keeper analysis can extend to shot type, location and even include those devilish deflections that confound the best.
However, for some, save percentage remains the most accessible way to convey information about a particular keeper.
This week was a
good example.
It may not be a cutting edge approach to evaluating a keeper, but for many, if not most, it is as deep as they wish to delve.
So what 1990's rules of thumb can be applied to the basic save currency of the current crop of keepers,
We know that the save percentages of this season's keepers is not the product of equally talented keepers facing equally difficult shots because the spread of the save percentages of each keeper is wider than if this was the case.
Equally, random variation is one component of the observed save percentages and small sample sizes are prone to producing extremes simple by chance.
If you want a keeper's raw save percentage to better reflect what may occur in the future regress his actual percentage in line with the following table.
Stoke's (or rather Derby's) Lee Grant has faced 31 shots on goal and saved 26 and his raw efficiency is 0.839. League average is running at 0.66.
Regress his actual rate by 70% as he's faced around 30 goal attempts, so (0.839 *(1-0.7)) + (0.7 * 0.66) = 0.713
Your better guess of Grant's immediate future, based on his single season to date is that his 0.839 save percentage from 31 shots may see him save 71% of the shots he faces, without any age related decline factored in.
He's still ranked first this season, but he's really close to a scrum of other keepers with similarly regressed rates. Ranking players instead of discussing the actual numbers is often strawman territory, anyway.
There's nothing wrong with using simple data, but you owe it to your article and audience to do the best with that data.
Raw save rates from one season are better predictors of actual save rates in the following season in just 30% of examples. 70% of the time your get a more accurate validation of your conclusion through your closeness to future events if you go the extra yard and regress the raw data.
At least Party Like it's 2016, not 1999.