Monday, 26 December 2016

Palace's Pre Christmas Expected Goals Breakdown.

This time last year, Palace were 6th in the Premier League and the Europa League was being touted as a legitimate aim. 

This time around they're 17th and have embarked upon Sam Allardyce's return to domestic football after his unbeaten reign as England manager.

                   Palace's ExpG Breakdown for their First 17 Games in 2015/16 and 2016/17.

Some small sample sized bulges have appeared in the way they've dealt with corners and set pieces and the post kick quality of the shots or headers (in grey) have been less kind in 2016/17 than they were in 2015/16, but overall the cumulative expected goals are broadly similar for both periods.

Randomness partly made Pardew a hero in 2015/16 and unemployed a year later. 

Tuesday, 13 December 2016

Blocks Away.

I've written before about a side's ability to block shots, the latest post was here and Burnley's large number of blocks in the Premier League to date has attracted the attention of Twitter.

Blocked shots may be examined in the same way that expected goals may be calculated from modelled historical data.

I have used Opta data that is the raw building block to power Timeform's InfogoApp to model the expected blocks a side may make based on a variety of variables, most notably how central a shot or header is taken from.

The model was built using data from previous seasons and used to predict blocks in the 2016/17 season to date. It adequately passed a variety of goodness of fit tests on the out of sample data.

I have looked at both the number of goal attempts that are blocked by a side, as well as the number of their own attempts that are blocked. So each side has been examined from an attacking and defensive viewpoint.

             Expected and actual Blocks in the 2016/17 Premier League After Matchday 13.

As you'd expect teams either over or under perform compared to the most likely number of blocks based on an average team model.

After 13 games, Liverpool had 78 of their own shots blocked compared to an expected baseline of 73. An under performance, but not really suggestive of anything other than simple variance.

It's slightly less easy to dismiss Sunderland's 48 blocked shots compared to an expected value of just 33. The chances that an average team takes Sunderland's attempts and sees at least 48 of them blocked is less than 1 in 200.

A simulation of all of Sunderland's goal attempts to week 13 produces the above distribution and likelihood of those attempts being blocked. Sunderland's actual block count or above can just be seen at the extreme right of the plot.

On the defensive side of the ball, the Puliser Prize for blocking in the face of adversity, surprisingly doesn't go to his current side, WBA, but Everton.

58 blocks compared to a tactically neutral expectation of just 45 and a 1 in 100 likelihood, hints at some degree of intent.

And Burnley, as they home in on a ton? 89 is a lot, but compared to an expectation of 80, it is their hospitality in allowing teams to shoot that has raised the bar as much as a tactically adepted blocking scheme.

An average side would equal or better 89 blocks after 13 games, given the shots allowed around 14% of the time.

Friday, 9 December 2016

Does Save Percentage Tell You Anything About A Keeper?

Back in the late 90's, on Usenet and a floppy disk enabled IBM Thinkpad, beyond the reach of the even the Wayback Machine, football stats were taking a giant lurch forward.

Where there had once been only goals, shots and saves had arrived.

Now, not only could we build multiple regression models where many of the predictor variables were highly correlated, we could also look at conversion rates for strikers and save rates for keepers.

In an era when the short term ruled, a keeper such as David Seaman was world class in terms of save percentage until he conceded more goals in a game at Derby than he had conceded in the previous five Premier League games. And then, as now, MotD's expecterts went to town on a singular performance.

Sample size and random variation wasn't high on the list of football related topics in 1997, but it was apparent to some that what you saw in terms of save percentage might not be what you'd get in the future.

You needed a bigger sample size of shots faced by a keeper and you also needed to regress that rate towards the league average.

This didn't turn save percentage into a killer stat, but it did make the curse of the streaky 90% save percentage more understandable when it inevitably tanked to more mundane levels.

 Spot the interloper from the future.

Fast forward and now model based keeper analysis can extend to shot type, location and even include those devilish deflections that confound the best.

However, for some, save percentage remains the most accessible way to convey information about a particular keeper.

This week was a good example.

It may not be a cutting edge approach to evaluating a keeper, but for many, if not most, it is as deep as they wish to delve.

So what 1990's rules of thumb can be applied to the basic save currency of the current crop of keepers,

We know that the save percentages of this season's keepers is not the product of equally talented keepers facing equally difficult shots because the spread of the save percentages of each keeper is wider than if this was the case.

Equally, random variation is one component of the observed save percentages and small sample sizes are prone to producing extremes simple by chance.

If you want a keeper's raw save percentage to better reflect what may occur in the future regress his actual percentage in line with the following table.

Stoke's (or rather Derby's) Lee Grant has faced 31 shots on goal and saved 26 and his raw efficiency is 0.839. League average is running at 0.66.

Regress his actual rate by 70% as he's faced around 30 goal attempts, so (0.839 *(1-0.7)) + (0.7 * 0.66) = 0.713

Your better guess of Grant's immediate future, based on his single season to date is that his 0.839 save percentage from 31 shots may see him save 71% of the shots he faces, without any age related decline factored in.

He's still ranked first this season, but he's really close to a scrum of other keepers with similarly regressed rates. Ranking players instead of discussing the actual numbers is often strawman territory, anyway.

There's nothing wrong with using simple data, but you owe it to your article and audience to do the best with that data.

Raw save rates from one season are better predictors of actual save rates in the following season in just 30% of examples. 70% of the time your get a more accurate validation of your conclusion through your closeness to future events if you go the extra yard and regress the raw data.

At least Party Like it's 2016, not 1999.