Thursday 26 December 2019

State of Play 2020

Liverpool’s bilingual mastermind behind the team’s meteoric rise to dominate club, domestic, European and now world football is gradually gaining a higher media profile.

Not Jurgen Klopp, although he has played a part in the Red’s success, but Dr Ian Graham, their current director of research.

Ian’s recent appearances in both the spoken and written media has not only highlighted the importance of an integrated approach to squad building that utilizes a data driven approach, alongside more traditional methods, it has also given a small glimpse into the analytical methods employed.

The latest profile landed courtesy of and described some fundamentals of Liverpool’s analytical philosophy.

One particularly resonated with Infogol’s approach of quantifying every footballing action in the same currency of goals or more specifically x goals.

The idea that every action, be it a pass, tackle or long throw changes the likelihood that a side will ultimately score isn’t a new concept.

It was probably first introduced into the public analytical domain by Dan Altman in his whistle stop OptaPro presentation in 2015 and hints of such models have been recently emerging from Opta itself and Twelve football.

Such a non-shot xG model also powers Infogol’s “Team of the Week”.

The gradual migration, at least inside the industry, from a purely chance based evaluation to a more holistic one somewhat mirrors the earlier transition from merely counting shots, as exemplified by total shot ratios from 2008 to a more informative, location based xG model, subsequently.

However, creating such non-shot models that quantify every on-field action is not a simple task. The granular data required to build non-shot models dwarfs that that was needed to create TSR, which itself was rudimentary and basic compared to that required to create a proficient xG model.

These leaps in data driven evaluation presents a dilemma for the aspirations of public and hobbyist analysts, an area that provided much of the driving force behind the early explosion in football analytics.

Latterly, monetization of ideas and a larger appetite for quantitative metrics to supplement opinion driven insight in the media and clubs, has swept many of those same hobbyists behind a non-disclosure paywall.

Less co-operation, dwindling numbers, availability of adequate data and the need for diverse technical skills to process that raw data, appears to have stifled the growth of football metrics in the purely public arena.

At the risk of falling victim to one of Twitter’s sloganized insults, “back in the day, metrics didn’t last long before they were improved upon or supplanted altogether”. suggested that Ian’s weapons grade model might be broadly replicated by current, readily available and much quoted metrics, such as xG Chain (I’ll let you google the definition).

Succinctly, the metric rewards every participant in a move that ends in a goal attempt with that chance’s entire xG.

The distribution of goodies can seem churlish, for example, by giving far less individual credit to the three Middlesbrough players who swept nearly the length of Stoke’s defensive transition to score a low probability winner on Friday night, as it would a marginally involved square ball on route to a multiple passing move that ends with a tap in from six yards.

More crucially it completely omits actions that aren’t concluded by a created chance.

To test’s optimism, I compared Infogol’s non-shot ball progression via passes and carries to the much-touted gold standard of xG Chain.

To avoid confusion over units, I’ve simply ranked the xG Chain and the non-shot ball progression for each player in the recent Merseyside derby and then compared a player’s rank in one metric with his rank in the other.

It starts off quite well. Sadio Mane ranks top in both, he was outstanding on the night. But then, much like Stoke’s trip to Middlesborough, things take a turn for the worse.

Shaqiri ranked an impressive 2nd overall in ball progression, but a lowly 16th in xG Chain, whereas Origi rates highly by the latter, but much less so in the former.

Overall, a third of the players have double digit ranking differences between their pecking order in both metrics. There are some agreements, but the relationship between the two metrics is generally weak.

Extend the study to every game played last season and this tenuous correlation between the two metrics remains.

One of the strengths of the early analytics movement was the ability to sift mere statistical trivia (team Y has recorded X when player Z plays, immediately springs to mind) from useful, if imperfect evaluations that convey insight and can be used to both evaluate and project future performance.

A great example of the latter is Dan Kennett’s recent Allisson tweet, which used big chances to highlight the keeper’s importance to Liverpool, both in the past and possibly in the future.

Save rates when faced with Opta’s Big Chances can be framed to be a very good proxy for a more exhaustive and granular, post shot xG2 modelling of a keepers saves and goals allowed.

Dan’s tweet was selective, but also carefully constructed enough to capture the keeper’s core attributes. Current retweets are approaching around 10 billion!

That should be the benchmark for widely used metrics and player contribution figures, such as xG Chain fail that test on numerous counts.

It fails to differentiate individual contribution, omits larger swaths of creditable actions and thus fails to correlate well with more exhaustive modelling of a similar player process.

The challenge for the public arena as we enter the roaring 20’s is to come up with constant improvements to substandard and potentially misleading measures….. and be more like Dan.

Tuesday 29 October 2019

Liverpool by One.

Old style goals based analysis hardly gets a run out nowadays with everyone arguing xG strawmen. So, let’s go the goals route to see if Liverpool’s record in single goal margin wins is “knowing how to win”, “unsustainable” or “about what you’d expect”.

Liverpool won 10 games by a single goal margin last season. That’s a lot, but well below the single season record held by Manchester United of 16 in 2012/13 and 2008/09.

United’s number of single goal wins in those subsequent seasons fell to five and eight respectively (although something more impactful may have also occurred in 2013/14). Their points tally fell as well, by 25 points in 2013/14 and by 5 in 2009/10.

To dilute the Fergie/Moyes effect, let’s look at the average record in the next season of teams who won 10 or more games by a single margin.

There’s over 90 of them during the 20 team history of the Premier League and 80% of those had fewer wins by the narrowest possible of margins during their next Premier League season, 74% also saw their points total fall.

These teams who edged lots of close matches one season shed around 10% of their points in the next season.

Initially, it’s not looking too rosy for Liverpool’s ability to sustain these narrow wins.

However, there’s another factor to consider.

Single goal wins, on average account for 41% of a side’s Premier League points total, but in our sample of 90+ teams who won 10 or more, 80% of them accrued more than 41% of their points from such victories.

Everton won 76% of their 59 points in 2002/03 from single goal wins and then tried their very best to get relegated in 2003/04 as their “luck” in narrow games returned to earth and they won just 39 points.

In Liverpool’s case in 2018/19, one goal margin wins only accounted for 31% of their 97 points. Therefore, their ten such wins places them in a group of sides who typically regress, but the percentage of total points they win in this manner is entirely atypical of that group.

To see where Liverpool stand as being adept at winning single goal margin games, we need to look at their underlying goals record.

In 2018/19 they scored 89 and conceded 22, taking the Poisson route, that’s consistent with winning nine games by a single goal over 38 games. They won, as we’ve seen ten, hardly a worryingly large over-performance.

You can lump Liverpool in with a group of teams who have achieved good things, partly as a result of “knowing how to win” (Leicester 2015/16 spring to mind, 14 single goal wins where nine would have been a more equitable return), but unlike most of these sides, the Reds have the underlying numbers to deserve their record.

Expect a few more 2-1’s between now and May.

Monday 21 October 2019

Closing the Door.

One of the most fun aspects of football data analysis is when the team you're part of derives some exciting newly derived metrics from the raw data that allows you to look at old problems with a new light.

Some real heavy data lifting has been put into deriving our Non Shot expected goals model. So first a quick recap on what it does.

Whenever the ball is moved around the pitch there is a likelihood of scoring  from each location it finds itself in. We express this value as non shot xG and the difference between these values when an action is completed is the change in NSxG via that action.

There's also a "risk/reward" aspect for when you concede possession.

Finally, each team has (nearly always) a different NSxG for the same pitch location, because one major input is the distance to your opponents goal.

We've mainly looked at passing and ball carrying, so far, quantifying the differing importance to your side of moving the ball five yards out of your own penalty area or five yards into your opponents. But there's an obvious extension of this that flips the focus and examines how well a team prevents an opponent progression the ball.

This isn't just by making passing difficult, it's also by making it harder or easier for opponents to carry the ball forward as well.

It used to be call closing a player down, it's called any manner of terms nowadays.

Here's how sides are fairing in preventing ball progression in 2019/20.

The first thing you need is a benchmark figure to measure how well a side is closing down the opposition.

There's only been nine matches played by each Premier League team to date and they may have played a bunch of sides who aren't that good or willing to play out from the back, so we need to find a set of figures that reflect this possible imbalance of intent and talent.

Let's take Manchester United. They've played nine teams, Chelsea, CP, Leicester, Newcastle, Southampton, WHU, Arsenal, Wolves & Liverpool.

Those teams, in turn have also played nine teams (except Arsenal, who play tonight), that's 80 teams of which nine are Manchester United.

That's almost guaranteed to include every Premier League team at least once and makes up a decent sample of around 70-80 games depending upon how you slice it.

We therefore, we took those 71 non Manchester United matches played by Manchester United's opponents and looked at the "risk/reward" ball progression via both passes and ball carries for 100 pitch segments.

For each segment we calculated the average NS xG gained (or lost) per 100 pass & carry attempts. That was our baseline for United's opponents progression against a broad selection of opponents this season.

Then we repeated the exercise, but for these sides in their matches against Manchester United and ran a heat map to see where on the field these teams were finding it difficult to progress the ball against United and where they were having a easier time compared to their benchmark numbers against the rest of their opponents.

This is what it looks like ( ignore the numbers for now).

The red areas are where United's opponents are progressing the ball at lower levels against United than they've managed as a group against a basket of 71 other Premier League sides. Blue, they're doing better.

It's a pretty stark and clear picture of where on the field United have been making it difficult for their opponents to get the ball into more dangerous areas. Firstly, beginning in front of their opponent's own box and then aggressively in front of United's own. They aren't too fussed about targeting wide positions on halfway and not too good(?) at stopping runs or passes from the bye-line & in the box.

Here's Everton and they do harry the opposition, but it's a much more chaotic process, with very little structure, especially compared to United's disciplined approach.

And finally, here's Aston Villa.

There's no overt closing down of the opposition until they reach the box, at which point it seems to become all hands to the pump.

Wednesday 2 October 2019

Passing Risk Reward in the Premier League

The availability of richer data sources has naturally led to an interest in passing and ball progression.

The generally quoted passing metrics still gravitate towards event data such as goal attempts and actual scores as the major framework.

Passes that lead to a potential goal scoring attempt predominate in most current passing metrics and little has been done to differentiate between the contribution made by individual players involved in these possession chains.

In contrast, we've broken down the value of each pass attempted by referencing how likely a possession anywhere on the pitch has historically led to a goal, whether or not the possession ultimately result in an attempt on goal.

This so called non shot xG metric not only allows a route to value every ball progression, be it a pass or a carry, but also quantifies individual involvement, rather than sharing the credit equally between all those participating in the possession.

However, as often is the case in football metrics, only one side of the ball has been investigated.

Each pass attempt comes with a risk and reward.

The player attempting the pass has custody of a valuable team resource, namely the non shot xG value for possession of the ball at that precise position on the field.

The potential reward in making a progressive pass is to advance the ball to a more dangerous area of the field.

And the ever present risk is the cost of a turnover. The passing team lose the NS xG value they had by owning the ball and the opponents gain their own NS xG by taking possession of the ball.

Weighing a player's NS xG leger is problematical, but one way to express the risk reward balance of a players passing performance is to add up the NS xG value of every progressive pass they complete and compare this to the sum of the NS xG he loses through incomplete passes, along with the NS xG gained by the opponent taking possession of his errant attempts.

For example, in the nascent Premier League, Matteo Guendouzi's completed open play progressive passes have been received at areas on the field that totals 6.69 NS xG.

On the minus side, his picked off pass attempts has "lost" Arsenal 1.67 N xG. This is made up of loss of pitch position for Arsenal and the combined NS xG value for the opponent based on where possession is won.

Overall, and without regard for pass volume or minutes played, Guendouzi has a net positive 5.02 NS xG for Arsenal in 2019/10.

This puts him top of the Arsenal "risk/reward" passing charts and we feel is a much better single figure metric to describe a player's involvement in progressing his side towards the opponents goal.

Not only does it quantify individual involvement and utilses every pass attempted, it also penalises reckless or sloppy execution that leads to change of possession.

Here's the current pass risk/reward numbers for all 20 Premier League players with a minimum number of attempts.

Saturday 14 September 2019

Game State and Blocked Shots.

I've written a fair bit about game state and how it impacts on how a side approaches a match s the time elapses and occasionally the score line changes.

I don't use score differential to define "game state", instead I use a measure of how well each team is fairing based of their pre game expectation.

This can be defined as the expected points based on the current score and time elapsed or the expected success rate of a team, again when measured against a pre kick off baseline. The choice is entirely up to you.

The advantage of this approach is primarily when the game is tied (which it is for a fairly significant portion of most matches). Instead of counting offensive production for both sides at this score differential, there's usually a clear indication of which of the two teams is happier with the stalemate and which is not.

You also get a gradual movement of game state that incorporates the often omitted variable of time elapsed.

It's intuitive as to what might happen as game state ebbs and flows over the course of a match, as unhappy teams perhaps become more risk taking in order to change the current status quo, while pregame underdogs are forced or chose to attempt to bank their above expectation gains by becoming more defensive.

One slight problem with this approach is that it assumes a relatively balanced competitive edge between competing teams and further assumes that those needing to change the current scoreline are capable of attempting to do so.

Not to be harsh, but it's difficult to envisage a situation where Manchester City felt the need to protect a lead against say Newcastle or where Newcastle were technically able to up their attacking intent against the champions.

So often the presence of  clearly superior teams can skew conclusions. "Possession leads to wins" arose largely because better sides also had high levels of possession, but the possession was a byproduct of other things they did, rather than the primary driver of their results.

Remove Barca etc from the data and the relationship between possession and wins tended to disappear.

Therefore, firstly here's why "zero goal differential" (the game is level) shouldn't be regarded as a single game state.

Here's a sample of matches from the 2018/19 Premier League, involving games where one of the Big 6 wasn't playing. Thus the games weren't particularly one-sided from the outset.

Initially, I've simply counted the shot volume from regular play for teams when the score differential is zero (the game is level). The vertical axis records my version of changing game state, a larger negative value indicates that a team that is doing badly compared to the expectation at kickoff.

Typically, this may be when a home favourite is level a fair way into the game and a points expectation that may have been 1.75 expected points at 3 o'clock has fallen back towards one point as the clock ticks on towards 5.

Those above the blue score differential line of zero are doing better that they hoped for, they might have expected to average less than a point from such a game, but they are edging closer and closer to a point, with a possibility of nicking all three.

Each point represents a goal attempt and it's clear that the lions share are being taking by the disgruntled favs.

If we re-examine our intuition, it's likely that if the beneficiaries of the stalemate aren't taking that many shots in the match, they're doing things to prevent the ones at the other end going in.

Learning from the likes of Pulis and Dyche that will likely include blocking shots.

Next I built a simple xG model (just location & type), but also included the game state factor, not just at zero goal differential, but at all score differentials to see if it told anything about the likelihood a shot would be blocked or not.

I eliminated games where a red card had been shown, for obvious reasons.

The bottom line was that game state was a significant factor in correlating with whether an attempt was blocked or not, along with location and shot type. And the larger the decrease in a side's pre-match expectation when the attempt was taken, the more likely it became that the shot was blocked.

In short, without the superstar teams, run of the mill games appear to follow the "hold what we have" and "this is disappointing, let's crack on" mentality.

This is one route to improve the much criticised problem of single xG races, where one team scores early and then drops anchor, but whether it is a universal improvement to a predictive model is a question of over fitting the past and potentially screwing up the future.

Wednesday 11 September 2019

Rugby World Cup Simulation

World Cup's have been like London buses this year and the rugby union version kicks off in a week or so.

It's live and complete on terrestrial TV in the UK, with plenty of huge mismatches in the opening group games, before eight teams, (whom could be fairly accurately predicted beforehand) hold the really interesting knockout run to the Webb Ellis Trophy on November 2nd.

However, that's not to say that the group matches don't hold any intrigue. There are at least two tier one teams in each of the four groups and while they'll be expected to steamroller the lower grade group opponents, the outcomes of these elite matchup will have a huge bearing on how the pairings for the knockout phase pans out.

Therefore, if you want to chart the likelihood of a team's route to the final being paved with Southern hemisphere behemoths, a tournament simulation is the easiest method out there.

You'll need a ratings system to kickoff with, assuming you're shunning the merry-go-round that has been the world rankings. Ireland are the current leaders, having recently displaced Wales, who had just displaced New Zealand, who themselves had displaced South Africa....ten years ago.

So the world rankings, following a decade of stagnation have suddenly become volatile.

Let's make our own, instead.

I took the last 20 matches for all participants, and produced an attacking and defensive rating, based around match scores and opponent quality.

New Zealand are the tournament's most potent attack, they'll score around 14 more points against and average team than another average team would manage and Wales, courtesy of rugby league knowhow, has the best defence.

Next you need a way to simulate game outcomes.

The big clash of the group stages sees favourites New Zealand take on South Africa. After matching up the respective attacking and defensive ratings for each team, the model expects the All Blacks to average around 28.5 points and S Africa 23.5.

New Zealand are favoured by five points and there's likely to be 52 total points.

If we look at the spread of points scored and allowed by each side over the last year or so, we can produce a distribution of points that describes each team's likely scoring pattern in this game. We'll then draw a value randomly from this distribution for each team to simulate a single match scoreline and then repeat the process thousands of times.

After adding a few tweaks to mimic the largely redundant bonus points system rugby insists on employing and ensuring that each drawn score from the distributions is a "rugby score" (no scoring a grand total of four points etc), we just repeat for every group game, add up the total points won in the group, follow the draw format and find the winner.

This is how the simulations shake out.

Four sides with a double figure percentage chance of lifting the trophy, New Zealand, S Africa for the south and England and Wales for the north, with the former looking a vulnerable favourite.

Tuesday 2 July 2019

Quantifying the Value of Every Pass

I've written about passing models over the last couple of years and posted passing maps for individual players and teams recently. So here's a quick overview the passing model upon which those maps are based, how it was developed and how they might be useful.

The model is derived from location and time stamped Opta data for every pass attempt. The model has been build in conjunction with Infogol, but as yet it isn't part of the data available on the Infogol app.

I was keen to use familiar units for the passing model, therefore all values for successful or unsuccessful passes are expressed in expected goals.

I've purposely avoided such things as distance gained, as this often leads to arbitrary definitions for "key passes".

It also breaks down entirely when you approach the penalty area, not only in terms of scaling, but also assigning value to a backward pass that actually adds value to a side if it is completed. (Think pull backs from the goal line, a "progressive" pass can easily go backwards).

The baseline values are the likelihood that possession at any position on the field will end with a goal and is taken from historical data.

Therefore, if passing from one point to another improves the likelihood of a goal, the successful pass is quantified as the change in this likelihood.

Because the unit of measurement is how likely historically, possession is to turn into a goal, it doesn't require a goal attempt to ultimately be made at the culmination of the move.

This is a huge advantage over passing models that are based solely around attempts being taken because every pass attempt is counted (a player is not reliant on success or failure further down the passing chain).

It also makes the calculation of speed of attack much more relevant to the actual threat present (advancing the ball ten yards in a couple of seconds from the final third causes much more of a threat than advancing the ball 20 yards in half the time from your own penalty area).

Finally, because the units relate to aggregated historical outcomes of possessions, we can quickly give a value to any point of the field, which is not the case if the point value is based on the expected goals of an actual goal attempt from that position.

And so because a goal attempt isn't required the units are designated as non shot expected goals to differentiate them from shot based xG.

To keep things simple, the following non shot xG passing maps omit other actions, such as carries or dribbles, take no account for time of possession or any likely passing skill differential.

The following maps simply record any successful, "progressive" pass (by which I mean any pass that advanced the likelihood of a team scoring) made by a player during the last Premier League season.

The maps are simply conditional formatting in excel, on a 10X10 grid, overlaid with a pitch. 10X10 is used for the convenience of Opta's x,y pitch locations which run from 0-100, lengthwise and widthwise.

The darker the conditional formatting the more NS xG has been gained from a successful pass from that location. Either by small gains, but large passing volume, large gains and fewer passing volume or a combination of the two.

It's easy to show the passing distribution through other plots.

Here's England's newly capped Declan Rice's successful progressive NS xG gains for WHU in 2018/19.

This represents the starting point of every successful pass.

The plot is best used in conjunction with video analysis, but you can quickly see that Rice's sphere of influence is concentrated broadly in front of the back four and across the line, but he also delivers an impressive range of threatening passing options mid way inside the opposition half and just leftfield.

The next thing we'd like to know is where these passes end up, so the following plot illustrates where on the field this improvement in NS xG production from Rice via his passes is distributed and received by a team mate.

Overall Rice's progressive passes are received around 10% further upfield than their points of origin. He spreads the ball wide, as noted by the darker areas on the flanks either side of halfway and towards the final third. And he finds a team mate on the edge of the box, but doesn't appear to be a predominate passer of the ball into the box (particularly if we strip away set plays).

Rice appears to be an active and productive passer over around three quarters of the playing area, but may not be fully appreciated because he rarely plays a pass that may be considered an assist.

By contrast, here's a much more attacking NS xG passing profile from Manchester City's midfielder, David Silva. A darling of the highlights reel.

 Unlike Rice, Silva rarely ventures into his own half to begin build up play. The starting point for his progressive passing is a hot spot just outside the left edge of the opposition penalty area, although he does occasionally drift to the opposite side of the box.

The end point of his passing is again strongly centred around the left side of the field, but deep into the opposition box. He sticks rigorously to the left sided channels and relatively shuns pass attempts to the right side of the box from his team's perspective.

Finally, for now, we can also show where a player is showing up as the recipient of a progressive pass.

Once again his fondness for linking up with a team mate in the dangerous left hand side of the area is shown, firstly by the darker formatted green area just inside the left flank of the area on the plot and secondly in an actual example from a game.

This just scratches the surface of how these plots, maps and quantified valuing of passes can be useful in assessing a side, or a player. It is particularly welcome because it removes the highlight reel aspect that blights player assessment (particularly on youtube immediately following a transfer). We can see from the heat maps if creative passing into particular areas of the field is a largely consistent player trait....or if the exceptional pass, that perhaps resulted in a goal was a once in a lifetime fluke.

This post has concentrated on progressive, NS xG gaining successful passes, but it can also be applied to unsuccessful attempts to measure risk reward, the probability of a pass being completed can also be added and we can also look at ball retention plots to see which players excel at retaining the ball for others to make the decisive progressive deliveries.

Rice and Silva obviously play different midfield roles in widely differing teams, but their respective importance and discipline in playing a role it those two systems becomes much more apparent once we look at their passing contribution as a whole.

Tuesday 11 June 2019

The Best & Worst Passers in the 2018/19 Premier League.

This is essentially just a data drop of the passing abilities for every player who made at least 600 pass attempts in the last Premier League season, based on a non shot passing model.

Here's our approach.

Every inch of the pitch has a non shot expected goal value associated with it based on the likelihood a side will eventually score from that field position.

So it's very low if you have possession near your own goal, much higher if you possess the ball inside the opposition box.

Successfully passing the ball from one point to another leads to a change in NS xG.

If you have the ball on the edge of your own box and roll a pass five yards forward to a defensive midfielder, you get credited for improving the side's NS xG, but not by much. Repeat the move on the edge of the opponent's box and you'll get a fair bit more.

Knock the ball backwards and your side "loses" NSxG, but at least you keep the ball.

Give away possession, either as a defender accidently passing to an opponent near your goal and you lose a combination of the NS xG you had and the NS xG your opponent gains.

Similarly, try and fail with a tricky pass inside the opponent's final third and you lose the fairly substantial NS xG your side had, along with the much smaller NS xG the opposition has gained.

This has led to three definitions for types of passes, two successful and one not.

Firstly, successful, creative passes that improve a team's NS xG.

Then, successful, backward passes that retain the ball, but "loses" NS xG

And finally unsuccessful passes that turnover possession.

These are further normalised for position played.

A defender will have a very different average profile in each category, compared to an attacking midfielder and the metric is also normalised to 100 passing attempts to put players who play for a possession poor team on a more level footing with Manchester City.

Here's an example.

From left to right. The average Premier League full back adds 0.64 non shot xG per 100 passing attempts by way of successful, creative passes. TA-A added 1.026 NS xG/100, an improvement of 0.386 on the average full back.

Backward, successful passes where NS xG was "lost", but possession was retained mirrored the average experience of a full back.

An average full back actually lost 0.8 NS xG / 100 via turnovers, TA-A did slightly worse, losing 0.894, but this is a function of the risk/reward balance. He is given free rein to get into advance positions, but the reward is well worth the extra risks taken.

Here's the differentials for every player who made at least 600 passing attempts for all 20 clubs last season.

They've been normalised for position, but many are a product of the role they are asked to play and the stylistic approach of the team they represent.

Figures such as these cannot tell the entire story, pass volume in particular will be hugely relevant, but we can take a lot from the tables.

For instance, there's the different roles of goal keepers. Those who play out from the back, such as Alisson & Ederson added below average creativity, but are well above average when preventing turnovers.

Similarly, van Dijk is no more than an averagely creative passing centre back, but again the systematic demands of the team do not require him to be more adventurous. His main aim is to largely play unadventurous ball to slightly advanced players and again, not turn the ball over, which is reflected in his well above average turnover numbers.

Manchester City's adherence to keeping the ball is shown again by the turnover figures, with the perhaps significant exception of Sane, who is poor at retaining the ball, with little above creativity to compensate.

Passing volume ensures that their relatively unexceptional creativity, De Bruyne aside, invariably overwhelms an opponent.

And finally, the departing Hazard is a rare beast, who not only is above average creatively for his position, but also avoids the often boom or bust cycle by looking after the ball exceptionally well.

There are plenty of players who show above average creativity, but pay a relatively high price with turnovers.

Wednesday 15 May 2019

Non Shot Passing Profile for Liverpool 2018/19

Over the season, we've slowly introduced a non shot xG model in this blog.

We assign the likelihood that a goal will be scored (or conceded) by a team in possession at any location on the field.

Successfully advancing or turning the ball over at another position on the pitch changes the non shot xG for the possession and the difference between the two points can be used to quantify the on field action.

This framework can be used however the ball is moved, but an obvious single application is to evaluate passing and the resulting risk reward.

The approach sidesteps the need for a shot to be attempted to assign a value to an action, differentiates between safe passing with little purpose and includes a huge chunk of data that was previously ignored.

You can generally differentiate between two types of passing actions, one that advances the ball into a more dangerous position and one that moves the ball backwards to recycle a move.

These can obviously be further divided into successful and unsuccessful actions.

Therefore, at its broadest we can identify a player's non shot passing contribution into value added and lost by successful or unsuccessful attempts to progressively move the ball into a more dangerous area.And similarly, NS xG "lost" by a successful backward pass, where possession is maintained and potentially more harmfully, NS xG actually lost when unsuccessfully passing the ball towards one's own goal.

If we incorporate minutes played and overall team style, we may begin to identify important contributors and ways that a side attempts to move the ball around the field.

Here's Liverpool's Premier League season from 2018/19.

I've highlighted NSxG gained & lost from forward passes & that "lost" by successfully recycling the ball away from the opponent's goal.

The passing performance of the player's broadly splits into 4 separate categories.

Keita & Henderson take a back seat to the players in groups 2 & 4 when creating dangerous completed passes, but do frequently recycle the ball backwards.

Henderson has contributed 5% of the NS xG gained by Liverpool from a forward pass & accounted for 8% of the recycled, backward NS xG.

Group 2 are most active creatively, but do turn the ball over a lot. Although, that inevitably comes with the territory in which they operate and so you assume the two columns are an acceptable trade off.

Someone has to be entrusted with turning a good situation into a great one, even at the cost of losing the ball to an opponent.

Group 3 accumulate the lowest amount of improvement in NS xG, presumably by beginning moves from relatively deep areas and VvD aside, being relatively unadventurous.

The final group 4 are also fairly creative, operating in areas where even a short, completed pass can have a relatively large effect on NS xG and again the trade off is that often a large chunk of NS xG with which they have been entrusted can be quickly lost.

This group also retains possession, but cedes NS xG through laying the ball back from advanced areas of the field.

We might assume that these figures are the benchmark requirement for each position or group in the current Klopp side.

Wednesday 6 March 2019

Title Winners Aren't Becoming More Dominant Over Time.

Are the title winning teams in the Premier League getting more dominant because they're getting so much richer?

It seems a logical conclusion to draw given that Manchester City won the league with an unprecedented 100 points in 2017/18.

That obviously makes them the highest points per game team in 20 team Premier League history, but without context, such figures are largely meaningless.

Taking the points per game high point as a selective cutoff point is invariably going to furnish any number of apparently positive trendlines, but without taking a deeper look at how the league as a whole has evolved over a period of time, they too are context-less trivia.

The first 20 team Premier League season in 1995/96 had 98 draws, by 2017/18 the number had 99. But singular seasons may hide an upward or downward trend and this appears to be the case with drawn matches and by extension the total points that were won in a whole season.

The 1990's averaged 104 draws per season compared to just 92 for the comparable number of most recent Premier League campaigns.

Here's what this means for the average number of points won by sides in each Premier League season since 1995/96.

There has been a steady upward trend for the average number of points won by all Premier League teams since the beginning of the 20 team era, as draws have tended to decrease, therefore reducing the number of matches where just two points are won compared to those where three are gained.

So are the top teams taking a bigger share of this expanded points pot, which may indicate that they are being more dominant that their predecessors were.

One way to look at this context corrected view is to see how remote the representative of each finishing position has become from the average points won by a side in a particular season.

Manchester City in 2017/18 were 2.5 standard deviations above the league average points won that season. But it's a level of dominance that was very similar to that attained by Chelsea in 2004/05, Arsenal in 2003/04 and Manchester United in 1999/2000.

Here's the plot of how far from the average points all 20 finishing positions have been since 1995/96.

OK, it's messy. But it's fairly easy to see that the title winners aren't powering upwards in a ever improving arc. In fact it pretty much flatline's and might even be encouraged to dip downwards if we wanted to be "creative".

Here's an easier on the eye trendline for each final position.

Once you add the context of the points gathering environment over time, Man City 2017/18 are just a bump in the road and not part of a general trend. None of the top three finishing positions have shown to have improved their dominance over the rest of the league.

There's been a slight uptick for 4th to 7th placed sides, a down tick for 7th to 12th. Then everyone holds station, until the two worst teams become slightly more competitive over time, but still go down.

Thursday 21 February 2019

The Name Game.

Sports analytics, not just football (or soccer) has always had a problem when naming their metrics (see what I mean).

Corsi, TSR, Pythagorean and expected goals may work fine in a closed environment, but try sticking those terms into the mainstream and you're immediately on the back foot.

Jeff Stelling's rant wouldn't have been half as effective if he'd had to say "Chance quality, what's that!"

Anyway, we've already embarked on a second phase of attaching names to a brand new raft of models and performance indicators, except this time everyone's going to be scratching their heads about what it is that we're actually talking about.

Anyone who's ever posted an xG figure will be familiar with the "X get Y for their xG, why the difference" but the rise of the NS xG model will take that to new heights.

Shot based xG models (actually shots, headers and other body parts) all share a core set of inputs (location, type) and any additions simply move the dial slightly, but the steady onset of so call "Non Shot xG" models may lead to comparisons between models that bear very little relationship to one another.

538 has a NS xG model, defined thus,.

Non-shot expected goals is an estimate of how many goals a team could have scored given their nonshooting actions in and around their opponent’s penalty area.

Infogol has a NS xG model, but ours is based on the expected outcome of possession chains.

They currently share a name, but nothing else.

In an increasingly monetized situation it is understandable that some are reluctant or unable to share detailed descriptions of each model's makeup.

But, even if we can't avoid falling into the trap of using less than intuitive language to name commonly used metrics (as happened with xG), we perhaps should steer clear of using catch all terms, such as NSxG to describe future modelling efforts.

538's model appears to be event based, ours is possession based, so it's probably best to include this additional piece of information when presenting any NSxG models in the future. 

Thursday 31 January 2019

A Non Shot Addition to the xG Family

Shot based expected goals models can tell us a lot about a match by extending the sample size from around three for actual goals to well into double figures for goal attempts.

But they are event based descriptions of a match and don't always tell the whole story of a match.

The weakness of event based models, be they attempts, final third entries or touches in the box, is, rather obviously, that these event have to occur for them to be registered, often in the most competitively contested region of the field.

Non shot xG models can fill the void that sometimes exists by examining such things as possession chains and the probabilistic outcome that may occur between two teams of known quality.

Last night Liverpool drew 1-1 at home to Leicester.

The hosts, depending on your view point, were unlucky to lose because, "Leicester defended well", "Atko reffed the game poorly" or "Liverpool weren't themselves".

Shot based xG universally gave the match to Leicester. They created better chances and had a larger total shot based xG than the title contending Reds.

Here's Infogol's shot map from last night. Leicester created a couple of decent chances. Liverpool were restricted to attempts from distance.

However, if we look at the potential return for each team based on where and how frequently they began attacks against each other, combined with the typical outcome of such possession in expected goals terms and the talent based differential at completing or supressing passes or dribbles, the balance of "probabilistic" power shifts.

Liverpool shaded the non shot xG assessment by 2.4 to 1.1.

They had the ball frequently enough, beginning in sufficiently advanced areas to have scored a likely two or three goals, with a penalty thrown in for good measure.

Leicester would have typically replied once.

So why was it just 1-1.

Just plain randomness ? An early goal that caused Liverpool to cruise somewhat in a similar way to the return game earlier in the season. A clever Leicester game plan that frustrated Liverpool with a packed defense and a bit of luck from the officials.

There's no correct answer, but there are tools, both event and possession based that can add clarity and suggest areas of investigation.

Tuesday 29 January 2019

Simulating Post Game Outcomes with a Non Shot xG Model.

First there was xG, ExpG, expected goals, chance quality or whatever you wished to call it.

Then we simulated the shooting contest to create a likelihood and range of possible scores.

Next we added the different scoreline probabilities to arrive at a post game chance of the shooting contest ending as a win or a draw.

Undeniably these approaches help to illuminate the story of a single game, but there are occasions when a shot based approach can mislead.

Game state, (the combination of time remaining, scoreline and the talent differential of the two teams), can sometimes lead to a side prioritising winning the game as opposed to maximising the number of goals they may score.

The obvious example of these game state effects might be a side leading by a single goal deep into stoppage time heading for the corner flag, rather than the opposition penalty area or the reverse where a trailing team attempts a speculative long range effort instead of choosing to progress the ball and perhaps losing it before they can shoot.

Therefore, a simple xG tally can sometimes become distorted by attempts that aren't taken and attempts that perhaps shouldn't have been.

Non shot xG models may provide a partial solution to this occasional disconnect between xG totals and an eye witness account of a game.

Instead of using goal attempts when assessing the performance of each team, possessions my the chosen currency in a non shot chance quality model.

Non shot xG models aren't too concerned with how a team choses to use their possession.

Instead it takes a weighted midline between the situations where scoring a goal is the main aim and when alternatively preserving a lead is paramount.

A side who isn't being overwhelmed by a trailing opponent can therefore still build up non shot xG credit by claiming a fair share of possessions in varying areas of the pitch......even if they don't chose to convert them into actual goal attempts that would register in a shot based xG framework.

In short, a side may go shot-less for the final half hour in a game they lead, but still be largely in control of managing the advantageous scoreline.

Earlier this season, Liverpool went to Huddersfield and won 1-0 with a Salah goal in the 24th minute.

Huddersfield "won" the shot based xG contest 0.9 to 0.6 and whether you want to simulate every chance (some of Liverpool's were related opportunities) or simply run the relative xG totals through a poisson, you'll find that shot based xG thinks that Huddersfield were more likely to win the actual game than Liverpool.

The 1X2 splits are around 40/35/25.

So this is one of those occasions when shot based xG thinks the wrong team won, although it is blind to the superior team holding an early lead.

However, a possession based, non shot model, which values every possession and doesn't need a goal attempt to trigger a plus for either teams sees things rather differently.

Liverpool's possessions were, on average around 15% more valuable than Huddersfield's.

I only vaguely remember watching the match, but I didn't get the impression that Liverpool were very lucky to win, nor that, if needed they wouldn't have turned their superior possession chains into more chances.

If we now simulate the likelihood of each side turning their possessions into goals (with no regard for tactical, game state related nuances), Liverpool now win a non shot simulation 44% of the time compared to just 26% for Huddersfield.

There is no right answer when looking at who deserved a win or a loss, and while shot based xG offers one probabilistic opinion, as they say others are available and sometimes they will disagree.

Friday 25 January 2019

Putting Together a Possession Based Non Shot Model

I've previously written about non shot based models as an alternative to purely shot based xG, as well as a way of incorporating the 90+% of onfield actions that are omitted in the former.

A valid criticism of shot based models is that a goal attempt needs to be registered before expected goals tallies can be increased.

However, it is intuitively realised that continued incursions deep into an opponent's half are dangerous, even if a shot isn't forth coming and a dangerous ball that is played across the face of goal also carries a non recorded level of threat.

Similarly, a penalty kick gives a disproportionately large xG figure, particularly when compared to numerous other passes into the box that don't result in a reckless lunge and a favourable ref.

An alternative approach might be to count attack based events, such as final third passes or progressive runs and relate these to a likelihood of scoring. But this seems rather arbitrary and lacking a framework.

Our approach is to select a consistent unit to describe the model that is analogous to a goal attempt and we've chosen a possession.

We then need an equivalent figure to the expected goal figure for an attempt made on goal. And just as a shot based xG model is driven by the probability of scoring with a shot/header given a variety of identifiable parameters, we have used the likelihood that a possession will result in a goal.

Shot or header location are the primary factors n a shot based xG model, but modellers have shied away from such things as finishing skill and or goal keeping prowess, as the proliferation of statistical noise often swamps any signal.

However, in the more event rich environment of passes and ball progressions we may be more confident in including such skill differentials into a non shot model, without straying too far into xG2 shot based territory.

Anyone who watched Burton's second leg game with Manchester City couldn't not be swayed by the obvious individual and technical ability on show from City compared to their hosts. And the implied level of goal threat was much higher when City gained possession compared to the Brewers in a similar pitch location.

Therefore, in constructing a non shot based model, as well as such familiar universals as location, we also incorporate factors which identify both above average proficiency in passing as well as in disrupting passes or carries.

Here's a table I posted at the end of last season, showing the level of over or under performance for Premier League teams in pass completion and pass disruption.

It's notable that Man City were the best at completing passing sequences and suppressing opponent's attempts.

We now have an assembly of ingredients to produce a non shot equivalent to the purely shot based model.

Above is a game by game summary of the non shot xG differential for Manchester City in 2017/18.

Unsurprisingly, a team committed to possession and passing excellence, with high quality players almost always creates a possession environment that gives them a superior non shot xG differential.

And here's a game by game tally for Liverpool in 2017/18

Together with a shot based approach, a non shot model can perhaps add nuance to the balance of power between two sides, based on the frequency, location of possessions and pre game skill differentials of the sides, as well as exploring, via a shot based xG model the, now familiar occasions where a goal attempt was generated.

Thursday 17 January 2019

A Non Shot Expected Goals Look at the UCL Group Stages.

The last post looked at quantifying the increased contribution made by players attempting progressive passes based on the improvement in non shot expected goals via completing a pass and the likelihood that an average passer is able to successfully make such a pass.

We've been building non shot xG models for a few years, so lets take a look at how possession & passing ability can be redefined in terms of non shot xG from this season's UCL group games.

Once you have a NS xG framework you can look at the risk/reward of every attempted pass by quantifying the improvement in NS xG should the pass be completed.

This can be further combined with the likelihood a pass is completed against the risk of losing the initial NS xG you owned and handing NS xG to the opposition should they take possession.

To simplify the post, I'll just look at the reward side of the bargain and aggregate the expected value of a completion in NSxG units for all progressive passes attempted by the 32 UCL group teams and compare that value to the actual value of the completions they made.

This will quantify how often a side had possession in a dangerous area of the field and if, through better passers and/or receivers they outperformed an average passing team.

We'll also take a look at the value of passes allowed into dangerous areas and whether a side managed to reduce that value by making it difficult for opponents to complete passes compared to an average defence.

The defensive side of the ball is often ignored or described entirely in terms of completed actions, such as tackles or interceptions, with little context.

The "Attacking Reward from Progressive Passes NSxG" column is the model's average expectation that a progressive pass results in a possession somewhere on the field.

Playing a forward pass out of defence to the centre circle is very likely to be completed, but the value of the possession in the centre circle won't be that large.

Playing the ball into the opponent's penalty area, dependent upon the origin of the pass, won't be as easy to complete, but will result in a relatively large NS xG value if it is.

Overall, if an average team was willing and able to attempt the pass attempts of Real Madrid in the group phase, they would expect to accrue a cumulative NSxG of 74.2 NSxG over the six games.

Real actual gained 77.9 NSxG.

So they made lots of dangerous pass attempts (although they did also recycle the ball backwards) and over performed the average model by 4% based on actual completions.

Porto was one of the better defences. They allowed side's to make progressive passes worth a model value of 39.4 NS xG and restricted the completions to further depress the actual value to 36 NS xG over the six games.

The best offensive and defensive performers, in terms of NS xG accrued or allowed, along with above average efficiencies are shown in blue, underperformers in red.

Attack and defensive numbers are correlated, particularly from a possession standpoint. As Swansea showed possession can be a purely defensive strategy. So it makes sense to look at the attacking and defensive differentials, along with the performance of the 32 teams in the group phase.

Real Madrid had a net positive NSxG differential of +44.2 in topping group G and Crvena Zvezda a whopping -57.8 in propping up group C.

Real got the ball often into dangerous positions with above average efficiency and restricted the ability of opponents to do the same at league average efficiency.

This is a step towards quantifying progressive passes, rather than simply counting final third completions etc. It unsurprisingly tallies with actual performance and provides a framework to produce possession chain based evaluations of past and future games that isn't entirely reliant upon a shot based approach.