Thursday, 24 December 2020

Stoke and the Art of Crossing

Stoke Highlight the Art of Crossing.

Two Stoke City games, two headers, two goals and a duo of 1-0 wins not only demonstrates the fine lines that can separate six points from two in a low scoring sport, such as football, but also the important role still played by crosses in the modern game.

Lavishly assembled squads may partly spurn crossing as a primary route to goal in favour of more intricate, possession based passing sequences to create space before the final delivery, but even the likes of Arsenal when faced with the need for a goal do fall back on the traditional cross.

33 crosses yielded a single goal in a recent 2-1 home defeat for Arteta’s side against Wolves and infamously, Manchester United attempted over 80 crosses in a drawn game with Fulham in the last days of David Moyes’ reign.

Crossing, as a primary strategy reached a low point with Liverpool’s 2011/12 team consisting of a big target man, Andy Carroll and a host of players ready to deliver a cross, led by Stewart Downing.

Unfortunately, such a predictable game plan & and tendency to cross the ball early from less advanced field positions, resulted in a failed experiment. An average of 21 Liverpool crosses per game was rewarded with just four Premier League goals.

Present day Liverpool lead the analytics revolution, but their failed, decade old legacy helped to kick start that revolution, as data was used to explain why their cross heavy approach failed and where the lesson lay for teams to maximize the returns from a wide player’s staple delivery.

Crosses in general are inefficient.

Leagues vary, but as a baseline number, it takes upwards of 90 crosses to score a goal directly from the delivery. Secondary chances created after the initial header or shot, but during the same phase of play, improves the strike rate to around one goal every 50 crossed balls.

However, not all crosses are equal. The danger is more apparent if a side works a delivery from the byline compared to a last-minute desperation hoof from deep into the mixer.

Fortunately, data can differentiate between types of crosses. Whether the ball was chipped or driven on the ground, for example. But where crosses originate and where they are aimed provides the biggest insight into how to turn a cross into a winning formula.

You can divide the origin and intended destination of a cross into two broad categories depending on how effective they are at producing goals.

In the graphic below, prime areas are shown in red and the least effective in blue.

Blue wasteful target areas are intuitive.

If the ball is aimed too close to the goal line, they become prey to a dominant keeper. But place the cross too close to the edge of the box and any shot or header will be taken from distance and for every yard a striker moved away from the goal, the likelihood of a goal falls by ten percent.

The red sweet spot is between these two areas.

The touchline hugging, wasteful blue delivery areas give both the keeper and defenders time to defend the box, whereas moving infield to deliver the cross reduces defensive reaction time and greatly improves conversion rates.

Hitting a ball from a wide and deep wing position to the wasteful area of the six-yard box, going from one blue zone to another, only produces a goal every 500 attempts. Whereas a delivery from a red, prime infield area to a red, prime area of the box increases conversion rates to around one goal every 20 crosses.

Stoke City’s two winning goals against Wycombe and Middlesbrough have been added to the graphic and hit the sweet spot for both Fox & McClean’s delivery and Collins & Powell’s headed goals. They were assists that were drawn from the most productive area of the crossing playbook.

Of course, there’s much more than “crossing by the numbers” to a successful outcome.

Powell is an accomplished header of the ball. During his Championship career over 20% of his goal attempts have been from headers and he is adept at getting on the end of higher quality attempts than the league average. Whilst Collins’ physical attributes are obvious.

Campbell then crossed from one prime area to another for Cardiff to obligingly smack the ball into their own net, before he departed on a season long, injury induced hiatus, Fox hit the prime red zone with a pacy cross to defeat Blackburn & Brown repeated the prime to prime connection to set up Thompson to briefly draw level with Spurs in the Carabao Cup 1/4 final.   

Clever off the ball running also contributes, a seen by Vokes drawing away Wycombe defenders with his near post run & Stoke creating an over load of far post attackers for the goal against Middlesbrough.

Over recent games, Stoke City had the crossing basics in place and good things followed,

On the weekend when Stoke climbed into the playoff spots on the back of two smartly executed crosses, Arsenal in the North London derby were again trusting more to luck by throwing in another 44 crosses in the vain pursuit of a goal.

Monday, 20 April 2020

Scatter Plots

There's been a huge increase in football related scatter plots recently. So as the guy who produced the first such plots, I thought I'd quickly run through why I thought this simple plot was useful and then try to expand the idea to provide additional usefulness.

The initial plots were designed to both inform and characterise playing style.

I think still the most successful plots use related metrics, for example expected assists and expected goals per 90 for individual players.

These "makers and takers" plots easily split players into those whose predominant talent is to create chances, those who get onto the end of opportunities and those rare players who excel at both disciplines.

Here's one for Arsenal 2019/20.

It's got sample size issues, but it's fairly evident that the creative players are towards the top left and the goal poachers are to be found in the bottom right.

Another quite neat aspect of this type of plot is that you can run a line through a player to the origin and any one with a similar ratio of xG and xA will lie close to that line.

In league wide samples, therefore you can find emerging players with similar qualities to the established stars.

There's a lot of data swilling around today, these plots are simple to make, three minutes tops, and with some thought about what you're trying to illustrate, they inform pretty well.

Over the weekend I came back to the idea, to see if I could add information that tells you a little bit more than just the raw connection between two metrics.

Here's what I came up with. It's again just a simple scatter plot, but I've used bubble size to introduce a third variable (metric volume per 90).

In addition I've used a single performance metric (NS xG added from ball carries) along the x axis and instead of plotting a complementary metric on the vertical axis, I've used a number to denote how diverse the x axis metrics are for each player.

This just plots the top 20 NS xG added by players through their ability to successfully carry the ball forward and move their team into a more dangerous pitch position.

It's a good one to chose because you know that Adama Traore will top the list (and he does).

Rather than a sterile scatter, you've now got a chart that not only tells you about a performance metric, it also instantly adds another layer (success volume) from which you can draw addition information about the characteristics of a player.

In short, those towards the right of the plot add more NS xG per 90 than others.
Larger bubble size indicates more successful progressive carries per 90.
And higher up the chart indicates more disorder and unpredictability in what a player will positively achieve for his team when on the all.

I've annotated players with the additional information you can draw from these plots.

Thursday, 26 December 2019

State of Play 2020

Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”. suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.

It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.