Pages

Tuesday, 2 July 2019

Quantifying the Value of Every Pass

I've written about passing models over the last couple of years and posted passing maps for individual players and teams recently. So here's a quick overview the passing model upon which those maps are based, how it was developed and how they might be useful.

The model is derived from location and time stamped Opta data for every pass attempt. The model has been build in conjunction with Infogol, but as yet it isn't part of the data available on the Infogol app.

I was keen to use familiar units for the passing model, therefore all values for successful or unsuccessful passes are expressed in expected goals.

I've purposely avoided such things as distance gained, as this often leads to arbitrary definitions for "key passes".

It also breaks down entirely when you approach the penalty area, not only in terms of scaling, but also assigning value to a backward pass that actually adds value to a side if it is completed. (Think pull backs from the goal line, a "progressive" pass can easily go backwards).

The baseline values are the likelihood that possession at any position on the field will end with a goal and is taken from historical data.

Therefore, if passing from one point to another improves the likelihood of a goal, the successful pass is quantified as the change in this likelihood.

Because the unit of measurement is how likely historically, possession is to turn into a goal, it doesn't require a goal attempt to ultimately be made at the culmination of the move.

This is a huge advantage over passing models that are based solely around attempts being taken because every pass attempt is counted (a player is not reliant on success or failure further down the passing chain).

It also makes the calculation of speed of attack much more relevant to the actual threat present (advancing the ball ten yards in a couple of seconds from the final third causes much more of a threat than advancing the ball 20 yards in half the time from your own penalty area).

Finally, because the units relate to aggregated historical outcomes of possessions, we can quickly give a value to any point of the field, which is not the case if the point value is based on the expected goals of an actual goal attempt from that position.

And so because a goal attempt isn't required the units are designated as non shot expected goals to differentiate them from shot based xG.

To keep things simple, the following non shot xG passing maps omit other actions, such as carries or dribbles, take no account for time of possession or any likely passing skill differential.

The following maps simply record any successful, "progressive" pass (by which I mean any pass that advanced the likelihood of a team scoring) made by a player during the last Premier League season.

The maps are simply conditional formatting in excel, on a 10X10 grid, overlaid with a pitch. 10X10 is used for the convenience of Opta's x,y pitch locations which run from 0-100, lengthwise and widthwise.

The darker the conditional formatting the more NS xG has been gained from a successful pass from that location. Either by small gains, but large passing volume, large gains and fewer passing volume or a combination of the two.

It's easy to show the passing distribution through other plots.

Here's England's newly capped Declan Rice's successful progressive NS xG gains for WHU in 2018/19.


This represents the starting point of every successful pass.

The plot is best used in conjunction with video analysis, but you can quickly see that Rice's sphere of influence is concentrated broadly in front of the back four and across the line, but he also delivers an impressive range of threatening passing options mid way inside the opposition half and just leftfield.

The next thing we'd like to know is where these passes end up, so the following plot illustrates where on the field this improvement in NS xG production from Rice via his passes is distributed and received by a team mate.

Overall Rice's progressive passes are received around 10% further upfield than their points of origin. He spreads the ball wide, as noted by the darker areas on the flanks either side of halfway and towards the final third. And he finds a team mate on the edge of the box, but doesn't appear to be a predominate passer of the ball into the box (particularly if we strip away set plays).

Rice appears to be an active and productive passer over around three quarters of the playing area, but may not be fully appreciated because he rarely plays a pass that may be considered an assist.

By contrast, here's a much more attacking NS xG passing profile from Manchester City's midfielder, David Silva. A darling of the highlights reel.



 Unlike Rice, Silva rarely ventures into his own half to begin build up play. The starting point for his progressive passing is a hot spot just outside the left edge of the opposition penalty area, although he does occasionally drift to the opposite side of the box.



The end point of his passing is again strongly centred around the left side of the field, but deep into the opposition box. He sticks rigorously to the left sided channels and relatively shuns pass attempts to the right side of the box from his team's perspective.

Finally, for now, we can also show where a player is showing up as the recipient of a progressive pass.


Once again his fondness for linking up with a team mate in the dangerous left hand side of the area is shown, firstly by the darker formatted green area just inside the left flank of the area on the plot and secondly in an actual example from a game.


This just scratches the surface of how these plots, maps and quantified valuing of passes can be useful in assessing a side, or a player. It is particularly welcome because it removes the highlight reel aspect that blights player assessment (particularly on youtube immediately following a transfer). We can see from the heat maps if creative passing into particular areas of the field is a largely consistent player trait....or if the exceptional pass, that perhaps resulted in a goal was a once in a lifetime fluke.

This post has concentrated on progressive, NS xG gaining successful passes, but it can also be applied to unsuccessful attempts to measure risk reward, the probability of a pass being completed can also be added and we can also look at ball retention plots to see which players excel at retaining the ball for others to make the decisive progressive deliveries.

Rice and Silva obviously play different midfield roles in widely differing teams, but their respective importance and discipline in playing a role it those two systems becomes much more apparent once we look at their passing contribution as a whole.





Tuesday, 11 June 2019

The Best & Worst Passers in the 2018/19 Premier League.

This is essentially just a data drop of the passing abilities for every player who made at least 600 pass attempts in the last Premier League season, based on a non shot passing model.

Here's our approach.

Every inch of the pitch has a non shot expected goal value associated with it based on the likelihood a side will eventually score from that field position.

So it's very low if you have possession near your own goal, much higher if you possess the ball inside the opposition box.

Successfully passing the ball from one point to another leads to a change in NS xG.

If you have the ball on the edge of your own box and roll a pass five yards forward to a defensive midfielder, you get credited for improving the side's NS xG, but not by much. Repeat the move on the edge of the opponent's box and you'll get a fair bit more.

Knock the ball backwards and your side "loses" NSxG, but at least you keep the ball.

Give away possession, either as a defender accidently passing to an opponent near your goal and you lose a combination of the NS xG you had and the NS xG your opponent gains.

Similarly, try and fail with a tricky pass inside the opponent's final third and you lose the fairly substantial NS xG your side had, along with the much smaller NS xG the opposition has gained.

This has led to three definitions for types of passes, two successful and one not.

Firstly, successful, creative passes that improve a team's NS xG.

Then, successful, backward passes that retain the ball, but "loses" NS xG

And finally unsuccessful passes that turnover possession.

These are further normalised for position played.

A defender will have a very different average profile in each category, compared to an attacking midfielder and the metric is also normalised to 100 passing attempts to put players who play for a possession poor team on a more level footing with Manchester City.

Here's an example.



From left to right. The average Premier League full back adds 0.64 non shot xG per 100 passing attempts by way of successful, creative passes. TA-A added 1.026 NS xG/100, an improvement of 0.386 on the average full back.

Backward, successful passes where NS xG was "lost", but possession was retained mirrored the average experience of a full back.

An average full back actually lost 0.8 NS xG / 100 via turnovers, TA-A did slightly worse, losing 0.894, but this is a function of the risk/reward balance. He is given free rein to get into advance positions, but the reward is well worth the extra risks taken.

Here's the differentials for every player who made at least 600 passing attempts for all 20 clubs last season.

They've been normalised for position, but many are a product of the role they are asked to play and the stylistic approach of the team they represent.








Figures such as these cannot tell the entire story, pass volume in particular will be hugely relevant, but we can take a lot from the tables.

For instance, there's the different roles of goal keepers. Those who play out from the back, such as Alisson & Ederson added below average creativity, but are well above average when preventing turnovers.

Similarly, van Dijk is no more than an averagely creative passing centre back, but again the systematic demands of the team do not require him to be more adventurous. His main aim is to largely play unadventurous ball to slightly advanced players and again, not turn the ball over, which is reflected in his well above average turnover numbers.

Manchester City's adherence to keeping the ball is shown again by the turnover figures, with the perhaps significant exception of Sane, who is poor at retaining the ball, with little above creativity to compensate.

Passing volume ensures that their relatively unexceptional creativity, De Bruyne aside, invariably overwhelms an opponent.

And finally, the departing Hazard is a rare beast, who not only is above average creatively for his position, but also avoids the often boom or bust cycle by looking after the ball exceptionally well.

There are plenty of players who show above average creativity, but pay a relatively high price with turnovers.

Wednesday, 15 May 2019

Non Shot Passing Profile for Liverpool 2018/19

Over the season, we've slowly introduced a non shot xG model in this blog.

We assign the likelihood that a goal will be scored (or conceded) by a team in possession at any location on the field.

Successfully advancing or turning the ball over at another position on the pitch changes the non shot xG for the possession and the difference between the two points can be used to quantify the on field action.

This framework can be used however the ball is moved, but an obvious single application is to evaluate passing and the resulting risk reward.

The approach sidesteps the need for a shot to be attempted to assign a value to an action, differentiates between safe passing with little purpose and includes a huge chunk of data that was previously ignored.

You can generally differentiate between two types of passing actions, one that advances the ball into a more dangerous position and one that moves the ball backwards to recycle a move.

These can obviously be further divided into successful and unsuccessful actions.

Therefore, at its broadest we can identify a player's non shot passing contribution into value added and lost by successful or unsuccessful attempts to progressively move the ball into a more dangerous area.And similarly, NS xG "lost" by a successful backward pass, where possession is maintained and potentially more harmfully, NS xG actually lost when unsuccessfully passing the ball towards one's own goal.

If we incorporate minutes played and overall team style, we may begin to identify important contributors and ways that a side attempts to move the ball around the field.

Here's Liverpool's Premier League season from 2018/19.


I've highlighted NSxG gained & lost from forward passes & that "lost" by successfully recycling the ball away from the opponent's goal.

The passing performance of the player's broadly splits into 4 separate categories.

Keita & Henderson take a back seat to the players in groups 2 & 4 when creating dangerous completed passes, but do frequently recycle the ball backwards.

Henderson has contributed 5% of the NS xG gained by Liverpool from a forward pass & accounted for 8% of the recycled, backward NS xG.

Group 2 are most active creatively, but do turn the ball over a lot. Although, that inevitably comes with the territory in which they operate and so you assume the two columns are an acceptable trade off.

Someone has to be entrusted with turning a good situation into a great one, even at the cost of losing the ball to an opponent.

Group 3 accumulate the lowest amount of improvement in NS xG, presumably by beginning moves from relatively deep areas and VvD aside, being relatively unadventurous.

The final group 4 are also fairly creative, operating in areas where even a short, completed pass can have a relatively large effect on NS xG and again the trade off is that often a large chunk of NS xG with which they have been entrusted can be quickly lost.

This group also retains possession, but cedes NS xG through laying the ball back from advanced areas of the field.

We might assume that these figures are the benchmark requirement for each position or group in the current Klopp side.


Wednesday, 6 March 2019

Title Winners Aren't Becoming More Dominant Over Time.

Are the title winning teams in the Premier League getting more dominant because they're getting so much richer?

It seems a logical conclusion to draw given that Manchester City won the league with an unprecedented 100 points in 2017/18.

That obviously makes them the highest points per game team in 20 team Premier League history, but without context, such figures are largely meaningless.

Taking the points per game high point as a selective cutoff point is invariably going to furnish any number of apparently positive trendlines, but without taking a deeper look at how the league as a whole has evolved over a period of time, they too are context-less trivia.

The first 20 team Premier League season in 1995/96 had 98 draws, by 2017/18 the number had risen....to 99. But singular seasons may hide an upward or downward trend and this appears to be the case with drawn matches and by extension the total points that were won in a whole season.

The 1990's averaged 104 draws per season compared to just 92 for the comparable number of most recent Premier League campaigns.

Here's what this means for the average number of points won by sides in each Premier League season since 1995/96.


There has been a steady upward trend for the average number of points won by all Premier League teams since the beginning of the 20 team era, as draws have tended to decrease, therefore reducing the number of matches where just two points are won compared to those where three are gained.

So are the top teams taking a bigger share of this expanded points pot, which may indicate that they are being more dominant that their predecessors were.

One way to look at this context corrected view is to see how remote the representative of each finishing position has become from the average points won by a side in a particular season.

Manchester City in 2017/18 were 2.5 standard deviations above the league average points won that season. But it's a level of dominance that was very similar to that attained by Chelsea in 2004/05, Arsenal in 2003/04 and Manchester United in 1999/2000.

Here's the plot of how far from the average points all 20 finishing positions have been since 1995/96.


OK, it's messy. But it's fairly easy to see that the title winners aren't powering upwards in a ever improving arc. In fact it pretty much flatline's and might even be encouraged to dip downwards if we wanted to be "creative".

Here's an easier on the eye trendline for each final position.


Once you add the context of the points gathering environment over time, Man City 2017/18 are just a bump in the road and not part of a general trend. None of the top three finishing positions have shown to have improved their dominance over the rest of the league.

There's been a slight uptick for 4th to 7th placed sides, a down tick for 7th to 12th. Then everyone holds station, until the two worst teams become slightly more competitive over time, but still go down.

Thursday, 21 February 2019

The Name Game.

Sports analytics, not just football (or soccer) has always had a problem when naming their metrics (see what I mean).

Corsi, TSR, Pythagorean and expected goals may work fine in a closed environment, but try sticking those terms into the mainstream and you're immediately on the back foot.

Jeff Stelling's rant wouldn't have been half as effective if he'd had to say "Chance quality, what's that!"

Anyway, we've already embarked on a second phase of attaching names to a brand new raft of models and performance indicators, except this time everyone's going to be scratching their heads about what it is that we're actually talking about.

Anyone who's ever posted an xG figure will be familiar with the "X get Y for their xG, why the difference" but the rise of the NS xG model will take that to new heights.

Shot based xG models (actually shots, headers and other body parts) all share a core set of inputs (location, type) and any additions simply move the dial slightly, but the steady onset of so call "Non Shot xG" models may lead to comparisons between models that bear very little relationship to one another.

538 has a NS xG model, defined thus,.

Non-shot expected goals is an estimate of how many goals a team could have scored given their nonshooting actions in and around their opponent’s penalty area.

Infogol has a NS xG model, but ours is based on the expected outcome of possession chains.

They currently share a name, but nothing else.

In an increasingly monetized situation it is understandable that some are reluctant or unable to share detailed descriptions of each model's makeup.

But, even if we can't avoid falling into the trap of using less than intuitive language to name commonly used metrics (as happened with xG), we perhaps should steer clear of using catch all terms, such as NSxG to describe future modelling efforts.

538's model appears to be event based, ours is possession based, so it's probably best to include this additional piece of information when presenting any NSxG models in the future. 

Thursday, 31 January 2019

A Non Shot Addition to the xG Family

Shot based expected goals models can tell us a lot about a match by extending the sample size from around three for actual goals to well into double figures for goal attempts.

But they are event based descriptions of a match and don't always tell the whole story of a match.

The weakness of event based models, be they attempts, final third entries or touches in the box, is, rather obviously, that these event have to occur for them to be registered, often in the most competitively contested region of the field.

Non shot xG models can fill the void that sometimes exists by examining such things as possession chains and the probabilistic outcome that may occur between two teams of known quality.

Last night Liverpool drew 1-1 at home to Leicester.

The hosts, depending on your view point, were unlucky to lose because, "Leicester defended well", "Atko reffed the game poorly" or "Liverpool weren't themselves".

Shot based xG universally gave the match to Leicester. They created better chances and had a larger total shot based xG than the title contending Reds.



Here's Infogol's shot map from last night. Leicester created a couple of decent chances. Liverpool were restricted to attempts from distance.

However, if we look at the potential return for each team based on where and how frequently they began attacks against each other, combined with the typical outcome of such possession in expected goals terms and the talent based differential at completing or supressing passes or dribbles, the balance of "probabilistic" power shifts.

Liverpool shaded the non shot xG assessment by 2.4 to 1.1.

They had the ball frequently enough, beginning in sufficiently advanced areas to have scored a likely two or three goals, with a penalty thrown in for good measure.

Leicester would have typically replied once.

So why was it just 1-1.

Just plain randomness ? An early goal that caused Liverpool to cruise somewhat in a similar way to the return game earlier in the season. A clever Leicester game plan that frustrated Liverpool with a packed defense and a bit of luck from the officials.

There's no correct answer, but there are tools, both event and possession based that can add clarity and suggest areas of investigation.

Tuesday, 29 January 2019

Simulating Post Game Outcomes with a Non Shot xG Model.

First there was xG, ExpG, expected goals, chance quality or whatever you wished to call it.

Then we simulated the shooting contest to create a likelihood and range of possible scores.

Next we added the different scoreline probabilities to arrive at a post game chance of the shooting contest ending as a win or a draw.

Undeniably these approaches help to illuminate the story of a single game, but there are occasions when a shot based approach can mislead.

Game state, (the combination of time remaining, scoreline and the talent differential of the two teams), can sometimes lead to a side prioritising winning the game as opposed to maximising the number of goals they may score.

The obvious example of these game state effects might be a side leading by a single goal deep into stoppage time heading for the corner flag, rather than the opposition penalty area or the reverse where a trailing team attempts a speculative long range effort instead of choosing to progress the ball and perhaps losing it before they can shoot.

Therefore, a simple xG tally can sometimes become distorted by attempts that aren't taken and attempts that perhaps shouldn't have been.

Non shot xG models may provide a partial solution to this occasional disconnect between xG totals and an eye witness account of a game.

Instead of using goal attempts when assessing the performance of each team, possessions my the chosen currency in a non shot chance quality model.

Non shot xG models aren't too concerned with how a team choses to use their possession.

Instead it takes a weighted midline between the situations where scoring a goal is the main aim and when alternatively preserving a lead is paramount.

A side who isn't being overwhelmed by a trailing opponent can therefore still build up non shot xG credit by claiming a fair share of possessions in varying areas of the pitch......even if they don't chose to convert them into actual goal attempts that would register in a shot based xG framework.

In short, a side may go shot-less for the final half hour in a game they lead, but still be largely in control of managing the advantageous scoreline.

Earlier this season, Liverpool went to Huddersfield and won 1-0 with a Salah goal in the 24th minute.

Huddersfield "won" the shot based xG contest 0.9 to 0.6 and whether you want to simulate every chance (some of Liverpool's were related opportunities) or simply run the relative xG totals through a poisson, you'll find that shot based xG thinks that Huddersfield were more likely to win the actual game than Liverpool.

The 1X2 splits are around 40/35/25.

So this is one of those occasions when shot based xG thinks the wrong team won, although it is blind to the superior team holding an early lead.

However, a possession based, non shot model, which values every possession and doesn't need a goal attempt to trigger a plus for either teams sees things rather differently.

Liverpool's possessions were, on average around 15% more valuable than Huddersfield's.

I only vaguely remember watching the match, but I didn't get the impression that Liverpool were very lucky to win, nor that, if needed they wouldn't have turned their superior possession chains into more chances.

If we now simulate the likelihood of each side turning their possessions into goals (with no regard for tactical, game state related nuances), Liverpool now win a non shot simulation 44% of the time compared to just 26% for Huddersfield.

There is no right answer when looking at who deserved a win or a loss, and while shot based xG offers one probabilistic opinion, as they say others are available and sometimes they will disagree.

Friday, 25 January 2019

Putting Together a Possession Based Non Shot Model

I've previously written about non shot based models as an alternative to purely shot based xG, as well as a way of incorporating the 90+% of onfield actions that are omitted in the former.

A valid criticism of shot based models is that a goal attempt needs to be registered before expected goals tallies can be increased.

However, it is intuitively realised that continued incursions deep into an opponent's half are dangerous, even if a shot isn't forth coming and a dangerous ball that is played across the face of goal also carries a non recorded level of threat.

Similarly, a penalty kick gives a disproportionately large xG figure, particularly when compared to numerous other passes into the box that don't result in a reckless lunge and a favourable ref.

An alternative approach might be to count attack based events, such as final third passes or progressive runs and relate these to a likelihood of scoring. But this seems rather arbitrary and lacking a framework.

Our approach is to select a consistent unit to describe the model that is analogous to a goal attempt and we've chosen a possession.

We then need an equivalent figure to the expected goal figure for an attempt made on goal. And just as a shot based xG model is driven by the probability of scoring with a shot/header given a variety of identifiable parameters, we have used the likelihood that a possession will result in a goal.

Shot or header location are the primary factors n a shot based xG model, but modellers have shied away from such things as finishing skill and or goal keeping prowess, as the proliferation of statistical noise often swamps any signal.

However, in the more event rich environment of passes and ball progressions we may be more confident in including such skill differentials into a non shot model, without straying too far into xG2 shot based territory.

Anyone who watched Burton's second leg game with Manchester City couldn't not be swayed by the obvious individual and technical ability on show from City compared to their hosts. And the implied level of goal threat was much higher when City gained possession compared to the Brewers in a similar pitch location.

Therefore, in constructing a non shot based model, as well as such familiar universals as location, we also incorporate factors which identify both above average proficiency in passing as well as in disrupting passes or carries.



Here's a table I posted at the end of last season, showing the level of over or under performance for Premier League teams in pass completion and pass disruption.

It's notable that Man City were the best at completing passing sequences and suppressing opponent's attempts.

We now have an assembly of ingredients to produce a non shot equivalent to the purely shot based model.


Above is a game by game summary of the non shot xG differential for Manchester City in 2017/18.

Unsurprisingly, a team committed to possession and passing excellence, with high quality players almost always creates a possession environment that gives them a superior non shot xG differential.

And here's a game by game tally for Liverpool in 2017/18

Together with a shot based approach, a non shot model can perhaps add nuance to the balance of power between two sides, based on the frequency, location of possessions and pre game skill differentials of the sides, as well as exploring, via a shot based xG model the, now familiar occasions where a goal attempt was generated.

Thursday, 17 January 2019

A Non Shot Expected Goals Look at the UCL Group Stages.

The last post looked at quantifying the increased contribution made by players attempting progressive passes based on the improvement in non shot expected goals via completing a pass and the likelihood that an average passer is able to successfully make such a pass.

We've been building non shot xG models for a few years, so lets take a look at how possession & passing ability can be redefined in terms of non shot xG from this season's UCL group games.

Once you have a NS xG framework you can look at the risk/reward of every attempted pass by quantifying the improvement in NS xG should the pass be completed.

This can be further combined with the likelihood a pass is completed against the risk of losing the initial NS xG you owned and handing NS xG to the opposition should they take possession.

To simplify the post, I'll just look at the reward side of the bargain and aggregate the expected value of a completion in NSxG units for all progressive passes attempted by the 32 UCL group teams and compare that value to the actual value of the completions they made.

This will quantify how often a side had possession in a dangerous area of the field and if, through better passers and/or receivers they outperformed an average passing team.

We'll also take a look at the value of passes allowed into dangerous areas and whether a side managed to reduce that value by making it difficult for opponents to complete passes compared to an average defence.

The defensive side of the ball is often ignored or described entirely in terms of completed actions, such as tackles or interceptions, with little context.


The "Attacking Reward from Progressive Passes NSxG" column is the model's average expectation that a progressive pass results in a possession somewhere on the field.

Playing a forward pass out of defence to the centre circle is very likely to be completed, but the value of the possession in the centre circle won't be that large.

Playing the ball into the opponent's penalty area, dependent upon the origin of the pass, won't be as easy to complete, but will result in a relatively large NS xG value if it is.

Overall, if an average team was willing and able to attempt the pass attempts of Real Madrid in the group phase, they would expect to accrue a cumulative NSxG of 74.2 NSxG over the six games.

Real actual gained 77.9 NSxG.

So they made lots of dangerous pass attempts (although they did also recycle the ball backwards) and over performed the average model by 4% based on actual completions.

Porto was one of the better defences. They allowed side's to make progressive passes worth a model value of 39.4 NS xG and restricted the completions to further depress the actual value to 36 NS xG over the six games.

The best offensive and defensive performers, in terms of NS xG accrued or allowed, along with above average efficiencies are shown in blue, underperformers in red.

Attack and defensive numbers are correlated, particularly from a possession standpoint. As Swansea showed possession can be a purely defensive strategy. So it makes sense to look at the attacking and defensive differentials, along with the performance of the 32 teams in the group phase.


Real Madrid had a net positive NSxG differential of +44.2 in topping group G and Crvena Zvezda a whopping -57.8 in propping up group C.

Real got the ball often into dangerous positions with above average efficiency and restricted the ability of opponents to do the same at league average efficiency.

This is a step towards quantifying progressive passes, rather than simply counting final third completions etc. It unsurprisingly tallies with actual performance and provides a framework to produce possession chain based evaluations of past and future games that isn't entirely reliant upon a shot based approach.

Tuesday, 15 January 2019

Quantifying Passing Contribution.

Passing completion models have seeped into the public arena over the last couple of months, mimicking the methodology used in expected goals models.

Historical data is used to estimate the likelihood that a goal is scored by an average finisher based primarily on the shot type and location in the case of expected goals models. And a similar approach is used for passing models.

Historical completion rates based on the origin and type of pass is combined with the assumed target to model a likelihood that a pass is completed and actual completion rates for players are then compared with the expected completion rate to discern over and under performing passers.

However, this approach omits a huge amount of context when applied to passes.

A goal attempt has one preferred outcome, namely a goal. But the unit of success that is often used in passing models is a completion of the pass and that in itself leaves a lot of information off the table.

How much a completed pass advances a side should also be an integral ingredient of any passing model. Completion alone shouldn't be the preferred unit of success, because it isn't directly comparable to scoring in an expected goals model.

A player can attempt extremely difficult passes that barely advances the team's non shot expected goals tally. For example, a 40 yard square ball across their own crowded penalty area is difficult to consistently complete and the balance of risk and reward for success or failure is greatly skewed towards recklessness.

Completing such passes above the league average would mark that player as an above average passer, but if we include the expected outcome of such reckless passes, we would soon highlight the flawed judgement.

The premier passer of his generation is of course Lionel Messi. It isn't surprising that he would complete more passes than an average player would expect to based on the difficulty of each attempted pass.

But we can add much more context if we include the risk/reward element of Messi's attempted passes.

A full blown assessment of every pass Messi attempted in the Champions League group stages becomes slightly messy for this initial post. Instead I'll just look at the positive expected outcomes of his progressive passes.

150 sampled progressive passes made by Messi during the Champions League group stage have both an expected completion probability and an attached improvement in non shot expected goals should the pass be completed. (NS xG is the likelihood that a goal results from that location on the field, it isn't the xG from a shot from that location).

If we simulate each attempt made by Mess 1,000's of times based on these average probabilities and the NS gain should the pass be completed, we get a range and likelihood of possible cumulative NS xG values.

The most likely outcome for an average player attempting Messi's passes is that they would add between 2.4 and 2.6 non shot expected goals to Barcelona's cause.

The reality for Messi was that he added 3.1 non shot expected goals.

There's around a 10% chance that an average player equals or betters Messi's actual tally in this small sample trial. But it is quantified evidence that Messi may well be a better than average passer of the football.

Monday, 7 January 2019

Are Teams More Vulnerable After Scoring?

One of the joys from the "pencil and paper" age of football analytics was spending days collecting data to disprove a well known bedrock fact from football's rich traditional history.

2-0 = dangerous lead has been a "laugh out loud" moment for those who went on more than gut instinct for decades.

Nowadays, you can crunch a million passes to build a "risk/reward" model and the only limitation is whether or not your laptop catches fire.

Myth busting (or not) perceived wisdom is now a less time consuming, but still enjoyable pastime.

Teams being more vulnerable immediately following a goal turned up on Twitter this week, although I've lost the link, so does it hold water?

Here's what I did.

Whether a team scores in the next 60 seconds depends on a couple of major parameters.

Firstly, a side's goal expectation.

Again not to be confused with expected goals, goal expectation is a term from the pre internet age of football analytics which is the average number of goals a side is expected to score based on venue, their scoring prowess and the defensive abilities of their opponent on the day.

Secondly, how long has elapsed.

Scoring tends to increase as the game progresses.

45% of goals on average arrive in the first half and 55% in the second. So if you want to predict how likely a side is to score based on their initial goal expectation, it will be smaller if you're looking at the 60 seconds between the 12th and 13 minute, compared to between the 78th and 79th.

Therefore, you take the pre game goal expectation for each team and when one team scores you work out the goal expectation per minute from this general decay rate for the other team over the next ten minutes.

Then you work out the likelihood that the "scored on" team scores in each 60 second segment via Poisson etc.

And then you compare that to reality.

The model doesn't "know" one team has just conceded, so if their opponents are really more likely to concede following their goal, the model's prediction will significantly under estimate the expected number of goals compared to reality.

There's a few wrinkles to iron out.

The first minute after conceding is going to be taken up with one team doing a fair bit of badge kissing and knee sliding, so it won't last for 60 seconds.

It's also going to be difficult to reply in the sixth minute after conceding if you opponent scores in the 94th minute and the ref has already blown for fulltime.

There's also the question of halftime crossover, where the 6th minute might actually be 21 minutes after the goal is conceded.

You can deal with these fairly easily.

I took time stamped Premier League date, ran the methodology and found 91 occasions where a side scored within ten minutes of conceding.

(I also split the ten minutes into 60 second segments, but I want to keep this short & more general).

From the model, in that timeframe, you would have expected those teams to score , wait for it.......91 goals, based on when the goal was conceded, how good their attacking potential matched up to the opponent's defensive abilities and allowing for truncated opportunity at the end of the game & through celebration.

There's no need to invoke scoring team complacency or a conceding teams wrath to end up with the scoring feats achieved, at least in the sample of Premier League games I used.

Are Teams Vulnerable After Scoring?

Probably not.

Saturday, 5 January 2019

xG Tables

There's been a lot of interest on Twitter in deriving tables from expected goals generated in matches that have already been played out.

Average expected points/goals/ are a useful, but inevitably flawed way to express over or under performance in reality compared to a host of simulated alternative outcomes.

Averages of course are themselves flawed, because you can drown in 3 inches........blah,blah.

Here's one way I try to take useful information from a simulated based approach using "after the fact" xG figures from matches already played, that may not be as Twitter friendly, but does add some context that averages omit.

If you have the xG that each side generated in a match, you can simulate the likely outcomes and score lines from that match by your method of choice.

A side who out xG'ed the opponent is usually also going to be the most likely winner, in reality and in cyberspace.

But sometimes Diouf will run 60 yards, stick your only chance through Joe Hart's legs, nick three points and everyone's happy.

It just won't happen very often, but it does sometimes and then the xG poor team get three points and the others get none.

Simulate each game played, add up the goals and points and you now have two tables.

One from this dimension and one that "might" have happened in the absence of games state and free will.

It's easy and most readily understood to then compare the points Stoke got in reality to the points the multiple Premier League winners got in this alternative reality.

But it might be better if instead we compared the relative positions and points of each team in this simulation to the reality of the table.

I do that and repeat the process for every one of the 1,000's of simulations using each side's actual points haul in relation to each of their 19 rivals as the over/under performing benchmark.

This is what the 2017/18 season looked like in May based on counting the number of times a side's actual position and points in the table relative to all others was better than a xG simulation.


Top two overperformed, 3rd and 5th did what was expected, 4th and 6th under performed in reality.

Only 15% of the time did the xG simulation throw up a Manchester City season long performance that out did their actual 2017/18 season.

The model might have under valued City's ability to take chances, prevent goals, they might have been lucky, for instance scoring late winners and conceding late penalties to teams who can't take penalties.

So when you come to evaluate City's 2018/19 chances, you may take away that they were flattered by their position, but concluded that the likely challengers were so far behind that they are still by far the most likely winners.

Man United, De Gea, obviously.

Liverpool, 4th but perhaps deserved better. Too far behind City to be a genuine title threat, unless they sort out the keeper & defence.

Burnley, score first, pack the defence and play a hot keeper, bound to work again.

Huddersfield, 16th was a buoyant bonus they didn't merit.

Relegated trio, Swansea, Stoke, pretty much got what they deserved, WBA, without actually watching them much last season, looked really hard done by. If you're going for the most likely bounce straight back team, it was the Baggies.

All of this comment was made in our pre season podcast.

You can use this approach for goals scored/allowed to see where the problems/regression/hot/cold might be running riot, plus simulations and xG are just one tool of many.