Pages

Saturday, 29 July 2023

The Missing Ingredient

Ages ago, Opta used to have an OptaPro blog (I wrote the first article).

I also wrote a blog about trying to utilise how close (or not) off target goal attempts came to requiring a save.

It was called something like "Don't be afraid to miss" and centred around data relating to Robin van Persie. (It was that long ago).

The site has long gone, but with access to more extensive data, such as shot placement (including off target attempts), I've revisited the idea to see if the intuition that "good finishers", when they miss, don't miss by much, is valid or not.

The idea is fairly basic. A shot that hits the post, is inches away from being a high quality post shot xG, whereas one that flies high and wide is going to need a fair bit of resighting to trouble the keeper.

The metric will also be related to the situation from which the attempt originated (open play, free kicks- etc), where on the field it came from (six yard box, outside the box) and whether the head or the boot was used.

So I took every off target non penalty attempt from the big five league for the last three completed seasons (over 53,000) and modelled by how far a typical big five player missed the target with their wayward efforts based around these pre-shot variables.

I then compared the "expected waywardness" to the actual waywardness of individual players for every play type scenario.

I expected Messi to come top. He didn't. He came second out of over 3200 players, although he did have three times as many errant efforts than the player who beat him (Matteo Politano). So Messi's number are more robust.

Here's the top ten players whose off target attempts are close enough to elicit an "Ohh" from the crowd, along with the ten players whose misses gave the goalframe the widest berth.

The lists seem to pass the eye test. Messi, de Bruyne and Son in one list and Maupay and Havertz in the other.

I took the 30 top post shavers and looked at their NPxG compared to their actual goals and it was a cumulative 489 NPxG compared to 556 actual goals, an over-performance of 13.7%.

The worst 30 hopelessly wayward had a cumulative NPxG of 434.9, but just 394 actual goals scored. An under-performance of 9.4%.

Roughly 40% of goal attempts are retrieved by the ball boy/girl. But rather than discarding that sizeable chunk of data, there might be good reason to at last try to gather some insight from these wayward efforts.

Tuesday, 27 June 2023

The Ageing of the Ageing Curve

A long, long, long, long time ago (2013) there wasn't that much granular data around and certainly hardly any based around metrics that eventually led to xG entering the mainstream of football/soccer analytics. Therefore, proxies abounded and share of playing time to evaluate a player's ageing rise and fall quickly became a go to method. Pull enough player data for minutes played, trawl through wiki for dob's and you came up with a pleasing curve that rose from the teenage years to the early twenties, peaked in the mid to late 20's and fell away as the ex pro went away to impart gnarled cliches on Sky. Plot the season by season change in playing time, the delta method & the linear trendline cut the axis where it was assumed that effortless push came to more laboured shove.
When a player's minutes finally fell instead of maintaining an ever shallower upward trend, the assumption was that he had become less effective on the field, performance levels had fallen, the manager had taken note and action had ensued. Physical decline, it was assumed, had begun to outstrip experience and smarts. We're now over a decade further down the road to analytical enlightenment, where on and off ball stuff gets routinely measured, even if there's still no consistency in naming metrics. Creativity, shooting execution & positional sense, ball progression via passes or carries & risk/reward has become less blurry & more transparent. We've also seen advances in sports science to prolong a the peak age of performance & witnessed anecdotal evidence for increased player longevity, even in demanding roles. So it's well overdue to usher "share of minutes played" into the lobby, treat it as just a fraction of what happens along the age curve & try to understand what might be going on in a player's career arc. Non shot expected goals added was one powerful metric that sought to measure by how much individual ball progression improved a team's likelihood of scoring. The delta approach to non shot xG added per 90 shows a gradual increase in performancein the early years of a player's career, but thereafter there's virtually no change, on average in the performance levels achieved per 90 minutes. It's not quite an expected trendline, more early improvement, but then flatlining.
The ball progression illustrated combines passes & carries and it's generally accepted that the latter is more physically demanding than the former. Perhaps players are replacing any shortfall of xG added via carries by upping their output from passes. To see what may be happening I looked at how NS xG added from just carries has changed over the last three completed seasons for all players as they age and again there's virtually no change on average in the rate of xG added from carries as a player ticks off their birthdays.
Persuasive speculation that as a player ages, their performance levels dip & they get left out of the side more frequently, kept "share of minutes played" a respected metric for nearly a decade. But how the new quantifiable metrics don't change that much well into a player's 30's may suggest that there is a slightly different dynamic at play for individuals, overall. Namely, player metrics, even more physically demanding ones which involve ball carrying, can with good managing of playing time enable players past what was considered their prime to maintain their own high standards. In short, you might get very similar levels of performance in a player's early 30's as you got in their mid 20's.....just not quite as often as you did previously.

Sunday, 14 November 2021

Football Analytics' Big Own Goal

Just a small vent regarding what a poor job the early analytics community did and continue to do when naming metrics. I know it's been widely pointed out, but "expected goals" is an awful name for the premier metric and nothing screams elitism and jargon than almost always going to acronyms. xG, PSxG, xA, NSxG may be easily understood by anyone who has immersed themselves in the topic, but as someone who has tried to get these ideas to a wider, more general football obsessed audience, they are an immediate barrier. It's almost certainly too late to begin using titles that use everyday language *and* are self explanatory, chance quality, for example rather than expected goals or on target chance quality, rather than post shot expected goals, (which isn't even accurate!). If you need a glossary to write an article about a team or player, who've failed. If you need to write "expected goals, which is"........... The same. Numerical values and decimal places are usually enough to disengage otherwise passionate fans of a sport. Chuck in jargon and you're almost inviting a negative reaction, regardless of the points you are trying to highlight. Three rules of naming metrics. 1) Don't use acronyms. 2) Use familiar language, ideally associated with the sport. 3) DUA

Friday, 28 May 2021

What is Goal Expectation?

Let's say you want to make an informed estimation about the upcoming England vs Scotland game at Wembley Stadium in Euro 2020 (2021).

One route would involve estimating the average number of goals England are likely to score against Scotland at Wembley and the average number of goals Scotland would score against England at the same venue.

You could then take a mathematical route to calculate the probability that two side with these average  goal expectation estimates would result in a home win, away win or a draw.

Typically a Poisson approach.

The average number of goals expected to be scored or allowed by a side in a future game has for over 30 years been referred to as their goal expectation

Unfortunately, a more recent and widely discussed metric based on the chance quality of a scoring opportunity, has arrived on the scene and taken the very similar name of expected goals.

They are not the same.

The former, GOAL EXPECTATION, is a measure of the likelihood of success for a side prior to kick off, based on historical data that is used to quantify the difference in quality between the sides. (It may even use historical expected goals data).

The latter, EXPECTED GOALS, is a value ascribed to the quality of attempts on goal, after the fact, based on the characteristics, shot type, location etc of each attempt.

The goal expectation of England and Scotland in the upcoming game is around 2.12 goals and 0.48 goals, respectively.

The expected goals for the game hasn't yet materialised.




Friday, 12 March 2021

XG as Easy as 1,2,3

One of the more interesting variants in the expected goals evolutionary backwater broke the scoring process down into stages. Most models go directly from shot location to goal/no goal output, but it is possible to include each of the possible outcomes.

A goal needs to jump through a variety of hoops to register (VAR excluded).

Shots can be blocked, they can miss the target, they can hit the woodwork or the can be saved before they enter the record books and each of these possibilities can be modelled separately.

This route isn’t inherently better than a single stage model, but it does help to throw a more descriptive, if not necessarily predictive light onto why and how a player is excelling or failing to convert location based chance quality into outcome based success.

It has been useful in trying to unpick the Brighton conundrum.

A plethora of underperformance has seen more blocks than expected from shots taken by Brighton players compared to an “expected blocks” model. This is further enhanced by the distance between blocker and Brighton shooter being the lowest in the league, they are getting closed down more extensively than any other team.

Which may suggest a slow and labored build up is degrading Brighton’s xG chances beyond what may be picked up by a one stop, rather than multi-layered xG model. Attacking tweaks, rather than patiently waiting for regression to kick in may be needed.

The next stage in the progression from shot to potential goal involves getting the ball on target.

One of the first xG think pieces I wrote for the now defunct OptaPro blog suggested that getting the ball on target wasn’t quite as straightforward a metric as it first appeared. In short, getting lots of shot on target wasn’t always the sign of an above average striker.

Robin van Persie, then of Manchester United was the guinea pig and his rather less than impressive rate of working the keeper with on target attempts didn’t seem to hurt his scoring performance.

The solution I suggested was that some players who aimed for more difficult to save areas of the goal, top corner, for example, might miss more frequently than players who prioritized target hitting at the expense of save difficulty.

In short, strikers shouldn’t be afraid to miss the goal.

So, we’ve run through two of the three xG stages.

Don’t get your shot blocked (that seems a universal aim, there seems a limited benefit in taking the ball so close to a blocking defender that the chances of having the shot blocked increases greatly).

Hit the target. A more ambiguous ambition. Most strikers could hit the target most of the time, but might compromise the difficulty to save their goal bound attempt.

The final stage is more akin to the traditional, one step model, but instead attempts that successfully negotiate the initial two stages are modelled against out of sample goal/no goal outcomes.

We’ve now got a multi-step xG model (that didn’t catch on from 2014), that adds tons of missing context that can be used to explain the “how” of why a player is returning the outcome from a location based process, even if it still falls to good old random variation to explain away much of the future performance levels.

Some factors affecting xG output may be systematic to teams or players (randomness is still the major player?) and by breaking the process down stage by stage, you can perhaps shine a light onto these additional factors.

Finally, here’s how over and under performers, with at least 10 regular play goals from shots only have maneuvered their way through the three stages of xG since 2016/17.




The table above includes diverse shooting profiles, which may be useful as a descriptor or potential as a coaching aid if the multi-stage xG model can pick up systematic flaws or talents that persist.

Jimenez avoids blocks at a league average, but then misses the target wantonly and his overall scoring from regular play with his boot falls way below the average expectation.

Grealish has more shots blocked than expected, misses the target more frequently, but runs a large over performance for goals scored. Placement is the likely culprit, here.

Whereas, Wood avoids blocks, hits the target, but tamely refuses to accumulate above average goal tallies.

It’s time to take data to the video booth.


Thursday, 24 December 2020

Stoke and the Art of Crossing

Stoke Highlight the Art of Crossing.

Two Stoke City games, two headers, two goals and a duo of 1-0 wins not only demonstrates the fine lines that can separate six points from two in a low scoring sport, such as football, but also the important role still played by crosses in the modern game.

Lavishly assembled squads may partly spurn crossing as a primary route to goal in favour of more intricate, possession based passing sequences to create space before the final delivery, but even the likes of Arsenal when faced with the need for a goal do fall back on the traditional cross.

33 crosses yielded a single goal in a recent 2-1 home defeat for Arteta’s side against Wolves and infamously, Manchester United attempted over 80 crosses in a drawn game with Fulham in the last days of David Moyes’ reign.

Crossing, as a primary strategy reached a low point with Liverpool’s 2011/12 team consisting of a big target man, Andy Carroll and a host of players ready to deliver a cross, led by Stewart Downing.

Unfortunately, such a predictable game plan & and tendency to cross the ball early from less advanced field positions, resulted in a failed experiment. An average of 21 Liverpool crosses per game was rewarded with just four Premier League goals.

Present day Liverpool lead the analytics revolution, but their failed, decade old legacy helped to kick start that revolution, as data was used to explain why their cross heavy approach failed and where the lesson lay for teams to maximize the returns from a wide player’s staple delivery.

Crosses in general are inefficient.

Leagues vary, but as a baseline number, it takes upwards of 90 crosses to score a goal directly from the delivery. Secondary chances created after the initial header or shot, but during the same phase of play, improves the strike rate to around one goal every 50 crossed balls.

However, not all crosses are equal. The danger is more apparent if a side works a delivery from the byline compared to a last-minute desperation hoof from deep into the mixer.

Fortunately, data can differentiate between types of crosses. Whether the ball was chipped or driven on the ground, for example. But where crosses originate and where they are aimed provides the biggest insight into how to turn a cross into a winning formula.

You can divide the origin and intended destination of a cross into two broad categories depending on how effective they are at producing goals.

In the graphic below, prime areas are shown in red and the least effective in blue.



Blue wasteful target areas are intuitive.

If the ball is aimed too close to the goal line, they become prey to a dominant keeper. But place the cross too close to the edge of the box and any shot or header will be taken from distance and for every yard a striker moved away from the goal, the likelihood of a goal falls by ten percent.

The red sweet spot is between these two areas.

The touchline hugging, wasteful blue delivery areas give both the keeper and defenders time to defend the box, whereas moving infield to deliver the cross reduces defensive reaction time and greatly improves conversion rates.

Hitting a ball from a wide and deep wing position to the wasteful area of the six-yard box, going from one blue zone to another, only produces a goal every 500 attempts. Whereas a delivery from a red, prime infield area to a red, prime area of the box increases conversion rates to around one goal every 20 crosses.

Stoke City’s two winning goals against Wycombe and Middlesbrough have been added to the graphic and hit the sweet spot for both Fox & McClean’s delivery and Collins & Powell’s headed goals. They were assists that were drawn from the most productive area of the crossing playbook.

Of course, there’s much more than “crossing by the numbers” to a successful outcome.

Powell is an accomplished header of the ball. During his Championship career over 20% of his goal attempts have been from headers and he is adept at getting on the end of higher quality attempts than the league average. Whilst Collins’ physical attributes are obvious.

Campbell then crossed from one prime area to another for Cardiff to obligingly smack the ball into their own net, before he departed on a season long, injury induced hiatus, Fox hit the prime red zone with a pacy cross to defeat Blackburn & Brown repeated the prime to prime connection to set up Thompson to briefly draw level with Spurs in the Carabao Cup 1/4 final.   

Clever off the ball running also contributes, a seen by Vokes drawing away Wycombe defenders with his near post run & Stoke creating an over load of far post attackers for the goal against Middlesbrough.

Over recent games, Stoke City had the crossing basics in place and good things followed,

On the weekend when Stoke climbed into the playoff spots on the back of two smartly executed crosses, Arsenal in the North London derby were again trusting more to luck by throwing in another 44 crosses in the vain pursuit of a goal.

Monday, 20 April 2020

Scatter Plots

There's been a huge increase in football related scatter plots recently. So as the guy who produced the first such plots, I thought I'd quickly run through why I thought this simple plot was useful and then try to expand the idea to provide additional usefulness.

The initial plots were designed to both inform and characterise playing style.

I think still the most successful plots use related metrics, for example expected assists and expected goals per 90 for individual players.

These "makers and takers" plots easily split players into those whose predominant talent is to create chances, those who get onto the end of opportunities and those rare players who excel at both disciplines.

Here's one for Arsenal 2019/20.

It's got sample size issues, but it's fairly evident that the creative players are towards the top left and the goal poachers are to be found in the bottom right.

Another quite neat aspect of this type of plot is that you can run a line through a player to the origin and any one with a similar ratio of xG and xA will lie close to that line.

In league wide samples, therefore you can find emerging players with similar qualities to the established stars.

There's a lot of data swilling around today, these plots are simple to make, three minutes tops, and with some thought about what you're trying to illustrate, they inform pretty well.

Over the weekend I came back to the idea, to see if I could add information that tells you a little bit more than just the raw connection between two metrics.

Here's what I came up with. It's again just a simple scatter plot, but I've used bubble size to introduce a third variable (metric volume per 90).

In addition I've used a single performance metric (NS xG added from ball carries) along the x axis and instead of plotting a complementary metric on the vertical axis, I've used a number to denote how diverse the x axis metrics are for each player.



This just plots the top 20 NS xG added by players through their ability to successfully carry the ball forward and move their team into a more dangerous pitch position.

It's a good one to chose because you know that Adama Traore will top the list (and he does).

Rather than a sterile scatter, you've now got a chart that not only tells you about a performance metric, it also instantly adds another layer (success volume) from which you can draw addition information about the characteristics of a player.

In short, those towards the right of the plot add more NS xG per 90 than others.
Larger bubble size indicates more successful progressive carries per 90.
And higher up the chart indicates more disorder and unpredictability in what a player will positively achieve for his team when on the all.

I've annotated players with the additional information you can draw from these plots.

Thursday, 26 December 2019

State of Play 2020


Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of Liverpool.com and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”.

Liverpool.com suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test Liverpool.com’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.


It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.

Tuesday, 29 October 2019

Liverpool by One.



Old style goals based analysis hardly gets a run out nowadays with everyone arguing xG strawmen. So, let’s go the goals route to see if Liverpool’s record in single goal margin wins is “knowing how to win”, “unsustainable” or “about what you’d expect”.

Liverpool won 10 games by a single goal margin last season. That’s a lot, but well below the single season record held by Manchester United of 16 in 2012/13 and 2008/09.

United’s number of single goal wins in those subsequent seasons fell to five and eight respectively (although something more impactful may have also occurred in 2013/14). Their points tally fell as well, by 25 points in 2013/14 and by 5 in 2009/10.

To dilute the Fergie/Moyes effect, let’s look at the average record in the next season of teams who won 10 or more games by a single margin.

There’s over 90 of them during the 20 team history of the Premier League and 80% of those had fewer wins by the narrowest possible of margins during their next Premier League season, 74% also saw their points total fall.

These teams who edged lots of close matches one season shed around 10% of their points in the next season.

Initially, it’s not looking too rosy for Liverpool’s ability to sustain these narrow wins.

However, there’s another factor to consider.

Single goal wins, on average account for 41% of a side’s Premier League points total, but in our sample of 90+ teams who won 10 or more, 80% of them accrued more than 41% of their points from such victories.

Everton won 76% of their 59 points in 2002/03 from single goal wins and then tried their very best to get relegated in 2003/04 as their “luck” in narrow games returned to earth and they won just 39 points.

In Liverpool’s case in 2018/19, one goal margin wins only accounted for 31% of their 97 points. Therefore, their ten such wins places them in a group of sides who typically regress, but the percentage of total points they win in this manner is entirely atypical of that group.

To see where Liverpool stand as being adept at winning single goal margin games, we need to look at their underlying goals record.

In 2018/19 they scored 89 and conceded 22, taking the Poisson route, that’s consistent with winning nine games by a single goal over 38 games. They won, as we’ve seen ten, hardly a worryingly large over-performance.

You can lump Liverpool in with a group of teams who have achieved good things, partly as a result of “knowing how to win” (Leicester 2015/16 spring to mind, 14 single goal wins where nine would have been a more equitable return), but unlike most of these sides, the Reds have the underlying numbers to deserve their record.

Expect a few more 2-1’s between now and May.

Monday, 21 October 2019

Closing the Door.

One of the most fun aspects of football data analysis is when the team you're part of derives some exciting newly derived metrics from the raw data that allows you to look at old problems with a new light.

Some real heavy data lifting has been put into deriving our Non Shot expected goals model. So first a quick recap on what it does.

Whenever the ball is moved around the pitch there is a likelihood of scoring  from each location it finds itself in. We express this value as non shot xG and the difference between these values when an action is completed is the change in NSxG via that action.

There's also a "risk/reward" aspect for when you concede possession.

Finally, each team has (nearly always) a different NSxG for the same pitch location, because one major input is the distance to your opponents goal.

We've mainly looked at passing and ball carrying, so far, quantifying the differing importance to your side of moving the ball five yards out of your own penalty area or five yards into your opponents. But there's an obvious extension of this that flips the focus and examines how well a team prevents an opponent progression the ball.

This isn't just by making passing difficult, it's also by making it harder or easier for opponents to carry the ball forward as well.

It used to be call closing a player down, it's called any manner of terms nowadays.

Here's how sides are fairing in preventing ball progression in 2019/20.

The first thing you need is a benchmark figure to measure how well a side is closing down the opposition.

There's only been nine matches played by each Premier League team to date and they may have played a bunch of sides who aren't that good or willing to play out from the back, so we need to find a set of figures that reflect this possible imbalance of intent and talent.

Let's take Manchester United. They've played nine teams, Chelsea, CP, Leicester, Newcastle, Southampton, WHU, Arsenal, Wolves & Liverpool.

Those teams, in turn have also played nine teams (except Arsenal, who play tonight), that's 80 teams of which nine are Manchester United.

That's almost guaranteed to include every Premier League team at least once and makes up a decent sample of around 70-80 games depending upon how you slice it.

We therefore, we took those 71 non Manchester United matches played by Manchester United's opponents and looked at the "risk/reward" ball progression via both passes and ball carries for 100 pitch segments.

For each segment we calculated the average NS xG gained (or lost) per 100 pass & carry attempts. That was our baseline for United's opponents progression against a broad selection of opponents this season.

Then we repeated the exercise, but for these sides in their matches against Manchester United and ran a heat map to see where on the field these teams were finding it difficult to progress the ball against United and where they were having a easier time compared to their benchmark numbers against the rest of their opponents.

This is what it looks like ( ignore the numbers for now).


The red areas are where United's opponents are progressing the ball at lower levels against United than they've managed as a group against a basket of 71 other Premier League sides. Blue, they're doing better.

It's a pretty stark and clear picture of where on the field United have been making it difficult for their opponents to get the ball into more dangerous areas. Firstly, beginning in front of their opponent's own box and then aggressively in front of United's own. They aren't too fussed about targeting wide positions on halfway and not too good(?) at stopping runs or passes from the bye-line & in the box.

Here's Everton and they do harry the opposition, but it's a much more chaotic process, with very little structure, especially compared to United's disciplined approach.


And finally, here's Aston Villa.


There's no overt closing down of the opposition until they reach the box, at which point it seems to become all hands to the pump.


Wednesday, 2 October 2019

Passing Risk Reward in the Premier League

The availability of richer data sources has naturally led to an interest in passing and ball progression.

The generally quoted passing metrics still gravitate towards event data such as goal attempts and actual scores as the major framework.

Passes that lead to a potential goal scoring attempt predominate in most current passing metrics and little has been done to differentiate between the contribution made by individual players involved in these possession chains.

In contrast, we've broken down the value of each pass attempted by referencing how likely a possession anywhere on the pitch has historically led to a goal, whether or not the possession ultimately result in an attempt on goal.

This so called non shot xG metric not only allows a route to value every ball progression, be it a pass or a carry, but also quantifies individual involvement, rather than sharing the credit equally between all those participating in the possession.

However, as often is the case in football metrics, only one side of the ball has been investigated.

Each pass attempt comes with a risk and reward.

The player attempting the pass has custody of a valuable team resource, namely the non shot xG value for possession of the ball at that precise position on the field.

The potential reward in making a progressive pass is to advance the ball to a more dangerous area of the field.

And the ever present risk is the cost of a turnover. The passing team lose the NS xG value they had by owning the ball and the opponents gain their own NS xG by taking possession of the ball.

Weighing a player's NS xG leger is problematical, but one way to express the risk reward balance of a players passing performance is to add up the NS xG value of every progressive pass they complete and compare this to the sum of the NS xG he loses through incomplete passes, along with the NS xG gained by the opponent taking possession of his errant attempts.

For example, in the nascent Premier League, Matteo Guendouzi's completed open play progressive passes have been received at areas on the field that totals 6.69 NS xG.

On the minus side, his picked off pass attempts has "lost" Arsenal 1.67 N xG. This is made up of loss of pitch position for Arsenal and the combined NS xG value for the opponent based on where possession is won.

Overall, and without regard for pass volume or minutes played, Guendouzi has a net positive 5.02 NS xG for Arsenal in 2019/10.

This puts him top of the Arsenal "risk/reward" passing charts and we feel is a much better single figure metric to describe a player's involvement in progressing his side towards the opponents goal.

Not only does it quantify individual involvement and utilses every pass attempted, it also penalises reckless or sloppy execution that leads to change of possession.

Here's the current pass risk/reward numbers for all 20 Premier League players with a minimum number of attempts.








Saturday, 14 September 2019

Game State and Blocked Shots.

I've written a fair bit about game state and how it impacts on how a side approaches a match s the time elapses and occasionally the score line changes.

I don't use score differential to define "game state", instead I use a measure of how well each team is fairing based of their pre game expectation.

This can be defined as the expected points based on the current score and time elapsed or the expected success rate of a team, again when measured against a pre kick off baseline. The choice is entirely up to you.

The advantage of this approach is primarily when the game is tied (which it is for a fairly significant portion of most matches). Instead of counting offensive production for both sides at this score differential, there's usually a clear indication of which of the two teams is happier with the stalemate and which is not.

You also get a gradual movement of game state that incorporates the often omitted variable of time elapsed.

It's intuitive as to what might happen as game state ebbs and flows over the course of a match, as unhappy teams perhaps become more risk taking in order to change the current status quo, while pregame underdogs are forced or chose to attempt to bank their above expectation gains by becoming more defensive.

One slight problem with this approach is that it assumes a relatively balanced competitive edge between competing teams and further assumes that those needing to change the current scoreline are capable of attempting to do so.

Not to be harsh, but it's difficult to envisage a situation where Manchester City felt the need to protect a lead against say Newcastle or where Newcastle were technically able to up their attacking intent against the champions.

So often the presence of  clearly superior teams can skew conclusions. "Possession leads to wins" arose largely because better sides also had high levels of possession, but the possession was a byproduct of other things they did, rather than the primary driver of their results.

Remove Barca etc from the data and the relationship between possession and wins tended to disappear.

Therefore, firstly here's why "zero goal differential" (the game is level) shouldn't be regarded as a single game state.



Here's a sample of matches from the 2018/19 Premier League, involving games where one of the Big 6 wasn't playing. Thus the games weren't particularly one-sided from the outset.

Initially, I've simply counted the shot volume from regular play for teams when the score differential is zero (the game is level). The vertical axis records my version of changing game state, a larger negative value indicates that a team that is doing badly compared to the expectation at kickoff.

Typically, this may be when a home favourite is level a fair way into the game and a points expectation that may have been 1.75 expected points at 3 o'clock has fallen back towards one point as the clock ticks on towards 5.

Those above the blue score differential line of zero are doing better that they hoped for, they might have expected to average less than a point from such a game, but they are edging closer and closer to a point, with a possibility of nicking all three.

Each point represents a goal attempt and it's clear that the lions share are being taking by the disgruntled favs.

If we re-examine our intuition, it's likely that if the beneficiaries of the stalemate aren't taking that many shots in the match, they're doing things to prevent the ones at the other end going in.

Learning from the likes of Pulis and Dyche that will likely include blocking shots.

Next I built a simple xG model (just location & type), but also included the game state factor, not just at zero goal differential, but at all score differentials to see if it told anything about the likelihood a shot would be blocked or not.

I eliminated games where a red card had been shown, for obvious reasons.

The bottom line was that game state was a significant factor in correlating with whether an attempt was blocked or not, along with location and shot type. And the larger the decrease in a side's pre-match expectation when the attempt was taken, the more likely it became that the shot was blocked.

In short, without the superstar teams, run of the mill games appear to follow the "hold what we have" and "this is disappointing, let's crack on" mentality.

This is one route to improve the much criticised problem of single xG races, where one team scores early and then drops anchor, but whether it is a universal improvement to a predictive model is a question of over fitting the past and potentially screwing up the future.

Wednesday, 11 September 2019

Rugby World Cup Simulation

World Cup's have been like London buses this year and the rugby union version kicks off in a week or so.

It's live and complete on terrestrial TV in the UK, with plenty of huge mismatches in the opening group games, before eight teams, (whom could be fairly accurately predicted beforehand) hold the really interesting knockout run to the Webb Ellis Trophy on November 2nd.

However, that's not to say that the group matches don't hold any intrigue. There are at least two tier one teams in each of the four groups and while they'll be expected to steamroller the lower grade group opponents, the outcomes of these elite matchup will have a huge bearing on how the pairings for the knockout phase pans out.

Therefore, if you want to chart the likelihood of a team's route to the final being paved with Southern hemisphere behemoths, a tournament simulation is the easiest method out there.

You'll need a ratings system to kickoff with, assuming you're shunning the merry-go-round that has been the world rankings. Ireland are the current leaders, having recently displaced Wales, who had just displaced New Zealand, who themselves had displaced South Africa....ten years ago.

So the world rankings, following a decade of stagnation have suddenly become volatile.

Let's make our own, instead.

I took the last 20 matches for all participants, and produced an attacking and defensive rating, based around match scores and opponent quality.

New Zealand are the tournament's most potent attack, they'll score around 14 more points against and average team than another average team would manage and Wales, courtesy of rugby league knowhow, has the best defence.

Next you need a way to simulate game outcomes.

The big clash of the group stages sees favourites New Zealand take on South Africa. After matching up the respective attacking and defensive ratings for each team, the model expects the All Blacks to average around 28.5 points and S Africa 23.5.

New Zealand are favoured by five points and there's likely to be 52 total points.

If we look at the spread of points scored and allowed by each side over the last year or so, we can produce a distribution of points that describes each team's likely scoring pattern in this game. We'll then draw a value randomly from this distribution for each team to simulate a single match scoreline and then repeat the process thousands of times.

After adding a few tweaks to mimic the largely redundant bonus points system rugby insists on employing and ensuring that each drawn score from the distributions is a "rugby score" (no scoring a grand total of four points etc), we just repeat for every group game, add up the total points won in the group, follow the draw format and find the winner.


This is how the simulations shake out.

Four sides with a double figure percentage chance of lifting the trophy, New Zealand, S Africa for the south and England and Wales for the north, with the former looking a vulnerable favourite.

Tuesday, 2 July 2019

Quantifying the Value of Every Pass

I've written about passing models over the last couple of years and posted passing maps for individual players and teams recently. So here's a quick overview the passing model upon which those maps are based, how it was developed and how they might be useful.

The model is derived from location and time stamped Opta data for every pass attempt. The model has been build in conjunction with Infogol, but as yet it isn't part of the data available on the Infogol app.

I was keen to use familiar units for the passing model, therefore all values for successful or unsuccessful passes are expressed in expected goals.

I've purposely avoided such things as distance gained, as this often leads to arbitrary definitions for "key passes".

It also breaks down entirely when you approach the penalty area, not only in terms of scaling, but also assigning value to a backward pass that actually adds value to a side if it is completed. (Think pull backs from the goal line, a "progressive" pass can easily go backwards).

The baseline values are the likelihood that possession at any position on the field will end with a goal and is taken from historical data.

Therefore, if passing from one point to another improves the likelihood of a goal, the successful pass is quantified as the change in this likelihood.

Because the unit of measurement is how likely historically, possession is to turn into a goal, it doesn't require a goal attempt to ultimately be made at the culmination of the move.

This is a huge advantage over passing models that are based solely around attempts being taken because every pass attempt is counted (a player is not reliant on success or failure further down the passing chain).

It also makes the calculation of speed of attack much more relevant to the actual threat present (advancing the ball ten yards in a couple of seconds from the final third causes much more of a threat than advancing the ball 20 yards in half the time from your own penalty area).

Finally, because the units relate to aggregated historical outcomes of possessions, we can quickly give a value to any point of the field, which is not the case if the point value is based on the expected goals of an actual goal attempt from that position.

And so because a goal attempt isn't required the units are designated as non shot expected goals to differentiate them from shot based xG.

To keep things simple, the following non shot xG passing maps omit other actions, such as carries or dribbles, take no account for time of possession or any likely passing skill differential.

The following maps simply record any successful, "progressive" pass (by which I mean any pass that advanced the likelihood of a team scoring) made by a player during the last Premier League season.

The maps are simply conditional formatting in excel, on a 10X10 grid, overlaid with a pitch. 10X10 is used for the convenience of Opta's x,y pitch locations which run from 0-100, lengthwise and widthwise.

The darker the conditional formatting the more NS xG has been gained from a successful pass from that location. Either by small gains, but large passing volume, large gains and fewer passing volume or a combination of the two.

It's easy to show the passing distribution through other plots.

Here's England's newly capped Declan Rice's successful progressive NS xG gains for WHU in 2018/19.


This represents the starting point of every successful pass.

The plot is best used in conjunction with video analysis, but you can quickly see that Rice's sphere of influence is concentrated broadly in front of the back four and across the line, but he also delivers an impressive range of threatening passing options mid way inside the opposition half and just leftfield.

The next thing we'd like to know is where these passes end up, so the following plot illustrates where on the field this improvement in NS xG production from Rice via his passes is distributed and received by a team mate.

Overall Rice's progressive passes are received around 10% further upfield than their points of origin. He spreads the ball wide, as noted by the darker areas on the flanks either side of halfway and towards the final third. And he finds a team mate on the edge of the box, but doesn't appear to be a predominate passer of the ball into the box (particularly if we strip away set plays).

Rice appears to be an active and productive passer over around three quarters of the playing area, but may not be fully appreciated because he rarely plays a pass that may be considered an assist.

By contrast, here's a much more attacking NS xG passing profile from Manchester City's midfielder, David Silva. A darling of the highlights reel.



 Unlike Rice, Silva rarely ventures into his own half to begin build up play. The starting point for his progressive passing is a hot spot just outside the left edge of the opposition penalty area, although he does occasionally drift to the opposite side of the box.



The end point of his passing is again strongly centred around the left side of the field, but deep into the opposition box. He sticks rigorously to the left sided channels and relatively shuns pass attempts to the right side of the box from his team's perspective.

Finally, for now, we can also show where a player is showing up as the recipient of a progressive pass.


Once again his fondness for linking up with a team mate in the dangerous left hand side of the area is shown, firstly by the darker formatted green area just inside the left flank of the area on the plot and secondly in an actual example from a game.


This just scratches the surface of how these plots, maps and quantified valuing of passes can be useful in assessing a side, or a player. It is particularly welcome because it removes the highlight reel aspect that blights player assessment (particularly on youtube immediately following a transfer). We can see from the heat maps if creative passing into particular areas of the field is a largely consistent player trait....or if the exceptional pass, that perhaps resulted in a goal was a once in a lifetime fluke.

This post has concentrated on progressive, NS xG gaining successful passes, but it can also be applied to unsuccessful attempts to measure risk reward, the probability of a pass being completed can also be added and we can also look at ball retention plots to see which players excel at retaining the ball for others to make the decisive progressive deliveries.

Rice and Silva obviously play different midfield roles in widely differing teams, but their respective importance and discipline in playing a role it those two systems becomes much more apparent once we look at their passing contribution as a whole.





Tuesday, 11 June 2019

The Best & Worst Passers in the 2018/19 Premier League.

This is essentially just a data drop of the passing abilities for every player who made at least 600 pass attempts in the last Premier League season, based on a non shot passing model.

Here's our approach.

Every inch of the pitch has a non shot expected goal value associated with it based on the likelihood a side will eventually score from that field position.

So it's very low if you have possession near your own goal, much higher if you possess the ball inside the opposition box.

Successfully passing the ball from one point to another leads to a change in NS xG.

If you have the ball on the edge of your own box and roll a pass five yards forward to a defensive midfielder, you get credited for improving the side's NS xG, but not by much. Repeat the move on the edge of the opponent's box and you'll get a fair bit more.

Knock the ball backwards and your side "loses" NSxG, but at least you keep the ball.

Give away possession, either as a defender accidently passing to an opponent near your goal and you lose a combination of the NS xG you had and the NS xG your opponent gains.

Similarly, try and fail with a tricky pass inside the opponent's final third and you lose the fairly substantial NS xG your side had, along with the much smaller NS xG the opposition has gained.

This has led to three definitions for types of passes, two successful and one not.

Firstly, successful, creative passes that improve a team's NS xG.

Then, successful, backward passes that retain the ball, but "loses" NS xG

And finally unsuccessful passes that turnover possession.

These are further normalised for position played.

A defender will have a very different average profile in each category, compared to an attacking midfielder and the metric is also normalised to 100 passing attempts to put players who play for a possession poor team on a more level footing with Manchester City.

Here's an example.



From left to right. The average Premier League full back adds 0.64 non shot xG per 100 passing attempts by way of successful, creative passes. TA-A added 1.026 NS xG/100, an improvement of 0.386 on the average full back.

Backward, successful passes where NS xG was "lost", but possession was retained mirrored the average experience of a full back.

An average full back actually lost 0.8 NS xG / 100 via turnovers, TA-A did slightly worse, losing 0.894, but this is a function of the risk/reward balance. He is given free rein to get into advance positions, but the reward is well worth the extra risks taken.

Here's the differentials for every player who made at least 600 passing attempts for all 20 clubs last season.

They've been normalised for position, but many are a product of the role they are asked to play and the stylistic approach of the team they represent.








Figures such as these cannot tell the entire story, pass volume in particular will be hugely relevant, but we can take a lot from the tables.

For instance, there's the different roles of goal keepers. Those who play out from the back, such as Alisson & Ederson added below average creativity, but are well above average when preventing turnovers.

Similarly, van Dijk is no more than an averagely creative passing centre back, but again the systematic demands of the team do not require him to be more adventurous. His main aim is to largely play unadventurous ball to slightly advanced players and again, not turn the ball over, which is reflected in his well above average turnover numbers.

Manchester City's adherence to keeping the ball is shown again by the turnover figures, with the perhaps significant exception of Sane, who is poor at retaining the ball, with little above creativity to compensate.

Passing volume ensures that their relatively unexceptional creativity, De Bruyne aside, invariably overwhelms an opponent.

And finally, the departing Hazard is a rare beast, who not only is above average creatively for his position, but also avoids the often boom or bust cycle by looking after the ball exceptionally well.

There are plenty of players who show above average creativity, but pay a relatively high price with turnovers.