Monday, 20 April 2020

Scatter Plots

There's been a huge increase in football related scatter plots recently. So as the guy who produced the first such plots, I thought I'd quickly run through why I thought this simple plot was useful and then try to expand the idea to provide additional usefulness.

The initial plots were designed to both inform and characterise playing style.

I think still the most successful plots use related metrics, for example expected assists and expected goals per 90 for individual players.

These "makers and takers" plots easily split players into those whose predominant talent is to create chances, those who get onto the end of opportunities and those rare players who excel at both disciplines.

Here's one for Arsenal 2019/20.

It's got sample size issues, but it's fairly evident that the creative players are towards the top left and the goal poachers are to be found in the bottom right.

Another quite neat aspect of this type of plot is that you can run a line through a player to the origin and any one with a similar ratio of xG and xA will lie close to that line.

In league wide samples, therefore you can find emerging players with similar qualities to the established stars.

There's a lot of data swilling around today, these plots are simple to make, three minutes tops, and with some thought about what you're trying to illustrate, they inform pretty well.

Over the weekend I came back to the idea, to see if I could add information that tells you a little bit more than just the raw connection between two metrics.

Here's what I came up with. It's again just a simple scatter plot, but I've used bubble size to introduce a third variable (metric volume per 90).

In addition I've used a single performance metric (NS xG added from ball carries) along the x axis and instead of plotting a complementary metric on the vertical axis, I've used a number to denote how diverse the x axis metrics are for each player.

This just plots the top 20 NS xG added by players through their ability to successfully carry the ball forward and move their team into a more dangerous pitch position.

It's a good one to chose because you know that Adama Traore will top the list (and he does).

Rather than a sterile scatter, you've now got a chart that not only tells you about a performance metric, it also instantly adds another layer (success volume) from which you can draw addition information about the characteristics of a player.

In short, those towards the right of the plot add more NS xG per 90 than others.
Larger bubble size indicates more successful progressive carries per 90.
And higher up the chart indicates more disorder and unpredictability in what a player will positively achieve for his team when on the all.

I've annotated players with the additional information you can draw from these plots.

Thursday, 26 December 2019

State of Play 2020

Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”. suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.

It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.

Tuesday, 29 October 2019

Liverpool by One.

Old style goals based analysis hardly gets a run out nowadays with everyone arguing xG strawmen. So, let’s go the goals route to see if Liverpool’s record in single goal margin wins is “knowing how to win”, “unsustainable” or “about what you’d expect”.

Liverpool won 10 games by a single goal margin last season. That’s a lot, but well below the single season record held by Manchester United of 16 in 2012/13 and 2008/09.

United’s number of single goal wins in those subsequent seasons fell to five and eight respectively (although something more impactful may have also occurred in 2013/14). Their points tally fell as well, by 25 points in 2013/14 and by 5 in 2009/10.

To dilute the Fergie/Moyes effect, let’s look at the average record in the next season of teams who won 10 or more games by a single margin.

There’s over 90 of them during the 20 team history of the Premier League and 80% of those had fewer wins by the narrowest possible of margins during their next Premier League season, 74% also saw their points total fall.

These teams who edged lots of close matches one season shed around 10% of their points in the next season.

Initially, it’s not looking too rosy for Liverpool’s ability to sustain these narrow wins.

However, there’s another factor to consider.

Single goal wins, on average account for 41% of a side’s Premier League points total, but in our sample of 90+ teams who won 10 or more, 80% of them accrued more than 41% of their points from such victories.

Everton won 76% of their 59 points in 2002/03 from single goal wins and then tried their very best to get relegated in 2003/04 as their “luck” in narrow games returned to earth and they won just 39 points.

In Liverpool’s case in 2018/19, one goal margin wins only accounted for 31% of their 97 points. Therefore, their ten such wins places them in a group of sides who typically regress, but the percentage of total points they win in this manner is entirely atypical of that group.

To see where Liverpool stand as being adept at winning single goal margin games, we need to look at their underlying goals record.

In 2018/19 they scored 89 and conceded 22, taking the Poisson route, that’s consistent with winning nine games by a single goal over 38 games. They won, as we’ve seen ten, hardly a worryingly large over-performance.

You can lump Liverpool in with a group of teams who have achieved good things, partly as a result of “knowing how to win” (Leicester 2015/16 spring to mind, 14 single goal wins where nine would have been a more equitable return), but unlike most of these sides, the Reds have the underlying numbers to deserve their record.

Expect a few more 2-1’s between now and May.