Thursday 26 December 2019

State of Play 2020

Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”. suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.

It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.