Tuesday 3 March 2015

Brendan Rodgers' Post January Sorcery?

It's been a "good" week for stats driven narratives. Chelsea's lack of spot kicks, quickly followed by Depay's poor shooting and now Liverpool's post January spurt have all been inferred from mere accumulated data.

The latest story revolved around Liverpool's impressive points per game rate, post New Year compared to their form in the opening five months of the campaign under Brendan Rodgers' tenure.

The figures are neatly summed up below from the excellent source of Liverpool data, Andrew Beasley and although SkySports, to their credit added a ? to their headline describing the trend, twitter was immediately rife with theories behind Rodgers' inspired form in the second half of the campaign.

So are we again most probably looking at definitive conclusions being drawn from raw data, with little or no regard for the effects of random variation?

Only the initial two seasons are complete and although the games are split fairly evenly in the first two campaigns, that does not mean that the early and late games are of similar difficulty.This is most evident in the 2013/14 season.

Liverpool, of course finished runners up last season and in the first half of the season they traveled to play the teams that would prove to be their five closest rivals. Chelsea, Manchester City, Arsenal, Spurs and Everton.

A group of five difficult away fixtures followed, post January by the relatively easier reverse fixtures is likely to have made 2013/14's Aug to Dec fixtures more difficult than the following Jan to May contests.

If we take the implied probabilities from the bookmaker's odds, Liverpool were expected to average around 1.87 ppg pre January and 2.11 post 2013. Even allowing for an inflation of the Reds' rating as the season progressed, it is still likely that the second half of 2013/14 was easier than the first.

But before we attempt to provide reasons for this apparent split performance, (and the various absences of Suarez and Sturridge in early 2013/14 and again in 2014/15 add to the complex mix of potential interactions), we should first see how likely it is that the split appeared just by chance.

If we simulate the two seasons worth of split data and the currently unequal portions from 2014/15, around 1 in 5 trials result in three consecutive seasons where January onwards has been more prolific for Liverpool in terms of points per game compared to the previous five months.

Two seasons out of three where January marks an upturn in points is the most likely scenario, but none of the four possible combinations should be considered as highly unlikely.

This does not mean that Rodgers has not perfected the art of crafting his team into a more effective unit as winter turns to spring or he was unlucky with injuries, but the possibility that we are seeing a split of results that were highly likely to occur for someone, if not Liverpool over the last two and three quarter seasons, simply by chance, should be regarded as a likely contributing cause.

Monday 2 March 2015

Memphis Depay's 0-40 is No Cause for Concern.

It's incredibly easy to use raw data to arrive at simplistic conclusions which portray a player or team in a particular light. Player A has scored X goals from Y shots can be used to to give substance to a view that the player is either world class or rubbish, depending upon the values of x and y.

However, football analysis should try to acknowledge the impact of random variation on the recorded outcomes of trials which may be both limited in size and derived from models that are missing many minor variables which may tweak probabilities in one direction or another.

PSV's Memphis Depay may soon be heading from the Dutch revolution that is currently underwhelming the "Theatre of Dreams", although his stock may have fallen if his current goalscoring prowess from outside the box is taken at face value.

Zero returns from 40 attempts appears a poor selling point. but as Simon Gleave points out in this tweet, context is everything in interpretation of and a 0-40, poor as it may intuitively appear, lacks any context.

Goals from outside the box are relatively rare events. The best available round up can be read at the StatsBomb site in this post from Dan Kennett.  Dan's conversion figure of one in 37 for the Premier league, immediately adds context to Depay's 0-40, even from another major European League. A single goal and Depay appears average, two and he doubles the headline rate.

Benchmark information, therefore helps when trying to make sense of relatively limited samples. But it is also easy to estimate the likelihood and range of possible outcomes from Depay's 40 shots from distance, using a shot location model and a simple simulation.

Depay's 40 shots varied in location, with the most optimistic effort having a likely success rate of less than 1 in around 250 and ranged to 1 in 14 for those closer to goal. An average shooter, taking open play shots from the positions chosen by Depay in 2014/15 would average 1 goal, in keeping with Dan's findings.

But this average would be distributed such that zero goals in a single run of 40 such trials wouldn't be a major surprise, occurring over 33% of the time.

Even if we assume that Depay is an above average striker of the ball from distance in open play and inflate the likelihood of success for each of his 40 attempts by a generous 10%, there still remains a significant 27% chance that he would fail to score in 40 such efforts.

Data collection (0-40) is the start of the process, benchmark figures (1-37) add initial context, but distributions begin to explain how unusually good or bad a set of data might be compared to the expectation of an average performer. 0-40 from even an above average shot taker from outside the box, isn't unusual at all.

Footnote. These type of simple simulations can be done in excel, but for an excellent primer on using R check out @SteMc74's tutorial here.

Double footnote. (Some progressive European clubs chose Sloan to announce the elimination of random variation or luck from their processes, especially those taking place outside the box, but for now the rest of us will continue to work under the limitations imposed by such forces).

Triple footnote. The risk/reward for shooting from distance was broken down by Colin Trainor in this presentation from the inaugural Opta Pro forum in 2014. Read it here.