Tuesday 18 February 2014

Zidane, The Guardian, Pulisball and the Value of Statistics.

It this post I suggested that knowledge of the cumulative goal expectation for a side is insufficient to fully understand the connection between shots, goal expectation and the final outcome of match. A side that spreads its goal expectation out over many shots, extends the possible range of the number of goals it may score. But such a team is at a long term disadvantage compared to a side that creates the same goal expectancy, but does so over fewer, more clear cut opportunities. The example I used was deliberately extreme to highlight the effect, although it did persist when goal expectation was diced between just slightly more chances.

This blog is primarily based around statistical analysis of football. Therefore, apparently I should be excluded from experiencing the artistic delights of say a Zidane (see the Guardian/WSC's recent enlightened blog post on stats in football).

Delving into the Frenchman's back catalogue to investigate the effects of a headbutt on World Cup final results, may be allowed, but marrying an interest in stats to an enjoyment of the artistry of football is a step too far for the self appointed guardians of the beautiful game.

Fortunately, the proposed embargo on watching and enjoying football, while also writing about the statistical side hasn't extended to the Potteries, where Stoke under Pulis invariably put theory into practice in a compelling celebration of the diversity of the sport. Although the genius of his design was often obscured by a blinkered appraisal tied simply to artistic merit.

Overall, a Tony Pulis side was out shot every season, but inched their goal expectation upwards by trying to maximize their quantity of high value goal attempts and skewed their distribution to include as many such efforts as Delap's arms could willingly provide. Goals scored with every available body part from the area of the six yard box, in defeating Arsenal at home in their first year of Premiership football, was the norm rather than the exception.

Rather than depart in May with a patronizing pat on the head for "playing football the right way", Pulis, whether knowingly or not, eked out every possible advantage, both tactical and now it appears, statistical, to creep above the 40 point line. In doing so his side provides an ideal test case on the real life implications of creating fewer big chances (Pulis) or more frequent, lesser ones (Hughes).

The tactical delivery systems used by Stoke worked most effectively at the Britannia Stadium, where they were able to make the six yard box a realistic target from virtually everywhere in their opponents half, by legally narrowing and shortening the playing dimensions. So I've used every shot faced and taken by Stoke during the 2010/11 season at home, when Delap was still an almost ever present. He played in every league home match bar one.

The goal expectation values are derived from my usual model that primarily incorporates x,y co-ordinates for each attempt and also accounts for the mode of the strike, either by the boot or a header.

Spreadsheets for nerds, Dukla Prague away kits for purists. Stoke roll the statistical dice in Bolton's 6 yards box. 

Some league wide generalities were strong enough to topple even Pulis' stylistic tendencies and Stoke, as the home side, did narrowly out shot their opponents at the Brit in 2010/11. 270 attempts from the visitors were met with just over 300 from Stoke.

The distribution of opportunities were more extreme, over 20% of the chances created by Stoke carried an expected goal probability of 20% or greater, compared to below 10% of total efforts carrying such a high likelihood of success for visiting sides. Therefore, this combination would, quite naturally result in a formidable home record for Stoke and in fact only the Big Four plus Liverpool gained more home wins than Stoke in 2010/11.

To avoid the tedium of an algebraic meltdown, I have simulated the 2010/11 home season for Stoke based on my approximation of the goal expectancy of each actual shot from either side in those 19 games. I've then repeated the process, but spread Stoke's goal expectation over twice as many chances. They have the same goal expectation, both for and against in each game, but the skewed distribution of big chances created under the banner of Pulisball has been reduced.

Above, I've plotted the spread of home league points Stoke might have expected to get creating their chances, either evenly spread across more opportunities or in the more "big chance", shot shy method preferred by Pulis. And once again the distribution of chances matters.

An average of 37 points were gained at home using Pulisball, compared to 35 .5 points if goal expectation per attempt were more uniform and chances more frequent. Pulisball gained at least 30 points (three quarters of the way to guaranteed safety of 40 points) over 97% of the time, compared to 87% in the alternative. And most telling, in paired comparisons, the former approach gained more points 55% of the time compared to 38% wins for the less extreme distribution, with 7% tied.

So a combination of visual evidence and statistical analysis might have bought us to a better understanding of one of the more intriguing and on going spectacles of the recent Premiership. Statistics can enhance an appreciation of all sports.

If we rely entirely on visual evidence, gut feeling and the power of prose, commentators would still be peddling the old myth that headbutting an opponent and getting red carded makes your numerically disadvantaged side "more difficult to play against".

1 comment:

  1. As measures, 'shots' and 'shots on target' are aggregated at too high a level to be useful in every instance. A team's defense may restrict the opposition to a high number of shots from outside the area, which one would expect a keeper to save. Alternatively it could be breached repeatedly, allowing 1) shots from dangerous areas; 2) shots in breakaway situations, and 3) shots or headers from dead ball situations, all of which are more likely to result in a goal.

    A better representation of actual goal chances would measure not just headers as against shots on target but 1) shots from inside/outside the box, and 2) shots taken before/after 20 secs. of possession in the other team's half. There will always be value in designing some proxy for how likely a defense is to lose the ball and allow a chance and how deep a defense is playing. A deep defense is more likely to concede goals to shots from outside the box on deflections, especially after corners.