Pages

Thursday, 26 December 2019

State of Play 2020


Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of Liverpool.com and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”.

Liverpool.com suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test Liverpool.com’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.


It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.

Tuesday, 29 October 2019

Liverpool by One.



Old style goals based analysis hardly gets a run out nowadays with everyone arguing xG strawmen. So, let’s go the goals route to see if Liverpool’s record in single goal margin wins is “knowing how to win”, “unsustainable” or “about what you’d expect”.

Liverpool won 10 games by a single goal margin last season. That’s a lot, but well below the single season record held by Manchester United of 16 in 2012/13 and 2008/09.

United’s number of single goal wins in those subsequent seasons fell to five and eight respectively (although something more impactful may have also occurred in 2013/14). Their points tally fell as well, by 25 points in 2013/14 and by 5 in 2009/10.

To dilute the Fergie/Moyes effect, let’s look at the average record in the next season of teams who won 10 or more games by a single margin.

There’s over 90 of them during the 20 team history of the Premier League and 80% of those had fewer wins by the narrowest possible of margins during their next Premier League season, 74% also saw their points total fall.

These teams who edged lots of close matches one season shed around 10% of their points in the next season.

Initially, it’s not looking too rosy for Liverpool’s ability to sustain these narrow wins.

However, there’s another factor to consider.

Single goal wins, on average account for 41% of a side’s Premier League points total, but in our sample of 90+ teams who won 10 or more, 80% of them accrued more than 41% of their points from such victories.

Everton won 76% of their 59 points in 2002/03 from single goal wins and then tried their very best to get relegated in 2003/04 as their “luck” in narrow games returned to earth and they won just 39 points.

In Liverpool’s case in 2018/19, one goal margin wins only accounted for 31% of their 97 points. Therefore, their ten such wins places them in a group of sides who typically regress, but the percentage of total points they win in this manner is entirely atypical of that group.

To see where Liverpool stand as being adept at winning single goal margin games, we need to look at their underlying goals record.

In 2018/19 they scored 89 and conceded 22, taking the Poisson route, that’s consistent with winning nine games by a single goal over 38 games. They won, as we’ve seen ten, hardly a worryingly large over-performance.

You can lump Liverpool in with a group of teams who have achieved good things, partly as a result of “knowing how to win” (Leicester 2015/16 spring to mind, 14 single goal wins where nine would have been a more equitable return), but unlike most of these sides, the Reds have the underlying numbers to deserve their record.

Expect a few more 2-1’s between now and May.

Monday, 21 October 2019

Closing the Door.

One of the most fun aspects of football data analysis is when the team you're part of derives some exciting newly derived metrics from the raw data that allows you to look at old problems with a new light.

Some real heavy data lifting has been put into deriving our Non Shot expected goals model. So first a quick recap on what it does.

Whenever the ball is moved around the pitch there is a likelihood of scoring  from each location it finds itself in. We express this value as non shot xG and the difference between these values when an action is completed is the change in NSxG via that action.

There's also a "risk/reward" aspect for when you concede possession.

Finally, each team has (nearly always) a different NSxG for the same pitch location, because one major input is the distance to your opponents goal.

We've mainly looked at passing and ball carrying, so far, quantifying the differing importance to your side of moving the ball five yards out of your own penalty area or five yards into your opponents. But there's an obvious extension of this that flips the focus and examines how well a team prevents an opponent progression the ball.

This isn't just by making passing difficult, it's also by making it harder or easier for opponents to carry the ball forward as well.

It used to be call closing a player down, it's called any manner of terms nowadays.

Here's how sides are fairing in preventing ball progression in 2019/20.

The first thing you need is a benchmark figure to measure how well a side is closing down the opposition.

There's only been nine matches played by each Premier League team to date and they may have played a bunch of sides who aren't that good or willing to play out from the back, so we need to find a set of figures that reflect this possible imbalance of intent and talent.

Let's take Manchester United. They've played nine teams, Chelsea, CP, Leicester, Newcastle, Southampton, WHU, Arsenal, Wolves & Liverpool.

Those teams, in turn have also played nine teams (except Arsenal, who play tonight), that's 80 teams of which nine are Manchester United.

That's almost guaranteed to include every Premier League team at least once and makes up a decent sample of around 70-80 games depending upon how you slice it.

We therefore, we took those 71 non Manchester United matches played by Manchester United's opponents and looked at the "risk/reward" ball progression via both passes and ball carries for 100 pitch segments.

For each segment we calculated the average NS xG gained (or lost) per 100 pass & carry attempts. That was our baseline for United's opponents progression against a broad selection of opponents this season.

Then we repeated the exercise, but for these sides in their matches against Manchester United and ran a heat map to see where on the field these teams were finding it difficult to progress the ball against United and where they were having a easier time compared to their benchmark numbers against the rest of their opponents.

This is what it looks like ( ignore the numbers for now).


The red areas are where United's opponents are progressing the ball at lower levels against United than they've managed as a group against a basket of 71 other Premier League sides. Blue, they're doing better.

It's a pretty stark and clear picture of where on the field United have been making it difficult for their opponents to get the ball into more dangerous areas. Firstly, beginning in front of their opponent's own box and then aggressively in front of United's own. They aren't too fussed about targeting wide positions on halfway and not too good(?) at stopping runs or passes from the bye-line & in the box.

Here's Everton and they do harry the opposition, but it's a much more chaotic process, with very little structure, especially compared to United's disciplined approach.


And finally, here's Aston Villa.


There's no overt closing down of the opposition until they reach the box, at which point it seems to become all hands to the pump.