Pages

Friday, 21 September 2018

A Brief History of Non-Shot xG Models.


There’s lots of new metrics turning up from non-shot models.

Normal xG is relatively straightforward.
The variables used may differ between models, but there is a core similarity based around shot type and location.

But as more and more “NSxG” models appear it is becoming apparent that one person’s NSxG model can be a completely different beast to someone else’s.

Here’s my broad definitions of what I mean when I use these terms based around the models we have developed at Infogol.

1)    Non- Shot xG

As the name suggests, shots, or more generally attempts at goal, do not hold a position of importance in a NSxG model.

They are simply another data point.

Possession, rather than goal attempts are central to this approach and the outcome variable is whether a goal was scored.

Possession of the ball deep in your own territory will have a relatively small NSxG value because many more such possessions will end with possession being turned over than a goal being scored.

Possession closer to the opponent’s goal is more likely to result in a goal and therefore will have a higher generic NSxG.

The pitch will be defined by a NSxG framework whereby every position on the field will have a NSxG value for the team in possession and the team attempting to take possession.

This is partly analogous to a normal xG probability map, but it is unlikely that the NSxG value will be the same as the xG value for the same position on the pitch.

       2)    Change in NSxG

Hopefully self-explanatory. The difference (positive or negative) in NSxG terms between one position on the field and another.

3)    A team’s NSxG value for a match.

Both NSxG and xG are attempting to describe the process a side has achieved in attempting to produce a favourable outcome.
Namely scoring more goals than they concede.

Both are expressed in expected goals, although one method (xG) looks at a limited subset of events that occurred in the match (goal attempts) and the other (NSxG) looks at every event that occurred, accumulated into separate possession chains.
They are entirely different models, albeit with the same ultimate aim of describing the events of a football match.

NSxG and (shot based) xG values should be broadly similar when summed together for a single game, although the NSxG contains much more granular information than a xG model and so small variations should be expected (and even hoped for).

The measured unit in xG is the expected goals value at the point of the goal attempt.

The measured unit in NSxG is the expected goals value at the initiation of each possession.

4)    NSxG risk / reward.

When a player attempts to move the ball from one field position to another, there exists the combined reward of keeping possession and improving or reducing the NSxG value of the possession at the point in the individual possession chain.

If we include the likelihood that the action will be successful based on either an average passing or ball progression model, we can determine if the action will have a positive or negative expectation from the view point of an average team.

We can further see if certain teams are taking more risky, negative expectation passes or actions, but because they have a repeatable over-performance in completing these actions they are turning negative expectation moves into positive expectation ones.
This ultimately adds context to possession data.

5)    NSxG Timelines.

Using cumulative accumulation of shot based xG for each side as the match progresses has it’s uses, but also critics.

Shots at goal account for less than 2% of game events, whereas many dangerous moves may stall just before an attempt is made.

Therefore, a NSxG approach that incorporates every possession may reveal more about how the match played out.

Simulations, while not immune to score effects, add another layer of information, indicating how likely it is that the match is either currently level or being led by one of the teams.

If we use goal attempts and their xG to simulate these likely states, we often only have around 30 simulation points.

By using NSxG we can increase not only the wealth of match data that is included, but also increase the simulation points by looking at every possession, rather than just every goal attempt.

6)    Player Ratings

Shot based xG major’s on attacking players and playmakers.

NSxG incorporates the small, but often, gains made by players further down the supply chain and can also be used to show how a side's effectiveness changes if an efficient ball circulator (who may not accrue much positive NSxG) is absent.

This allows a gateway into isolating the on-ball contribution made by all players to creating or preventing goals being scored.

7)    Example
12th August 2017 
xG Brighton 0.67 Manchester City 2.24

NSxG for all possessions, including ones leading to own goals.
NSxG Brighton 0.79 Manchester City 1.97

Timeline.
A dominant performance from Manchester City to open their title winning 2017/18 season. Only a 13% they lose the game based on possession chains.

Kevin De Bruyne most influential player in the match.

Monday, 14 May 2018

Non-Shot xG Passing Stats. The Complete Picture.


The 2017/18 Premier League season is now a wrap and you’ll be bombarded with end of season advanced stats, both team based and for individuals.

Mostly, these figures will largely confirm what we intuitively know. 

Kevin De Bruyne may not have come close to Mo Salah’s goal output, both actual and expected, but he contributed massively to Manchester City’s creative avalanche with outrageous passing ability.

The gradual advent of pass based, non-shot expected goals models is beginning to highlight the contribution of those creative players who often provide the raw material for the scorers to bask in the celebratory spotlight.

However, many of these interpretations have exclusively concentrated on the positive contributions made by attempting to advance the ball, while ignoring the cost when a player’s misplaced pass leads to a turnover.

Possession comes with responsibility as well as opportunity and while a completed pass rightly causes an uptick in expected goals fortunes for a side and a player, there is always a price to pay if the ball instead ends up at the feet of the opposition.

Infogol’s non-shot passing model gives an expected goals figure to every possible possession location on the field of play, but it will be different from the perspective of the two teams.

Possession on the edge of your own box will be worth very little in terms of non-shot expected goals, but would be hugely valuable if possession switched to your opponents.

So a misplaced pass that turns over possession deep in your own half will lose your side the tiny expected goals valuation that went along with that possession, but will also hand a much larger chunk of NS xG to your rivals.

The cost of losing that possession would be significant.

Similarly, lose possession deep in your opponents half and you are conceding the hard won NS xG owned by progressing deep into opposition territory and you’ll also hand a small amount of NS xG associated with opposition possession in their own half.

Just as we can tally the positive contributions made by players, we can also see what their misplaced passes cost their side.

It is inevitable that KDB will lose possession for his side in valuable areas, it is the natural cost of the high tariff passes he often attempts, but ignoring these entries in the debit side of the creative ledger omits the realistic representation of football as experienced by those who watch the full 90 minutes rather than just the highlight reel.

To give a flavour of the much more rounded picture NS model can convey, here’s a breakdown of the percentage of team passing creativity owned by players from the 2017/18 season, but also balanced by the percentage of team NS xG lost by misplaced passes that belong to the individual.

Top 10 Defenders.



Bottom 10 Defenders





Top 10 Midfielders



Bottom 10 Midfielders




Top 10 Strikers (+ Wayne).



Bottom 10 Strikers



Here’s the top and bottom 10 list of players that compares the amount of good things their passes have contributed against the times when their passing radar has gone astray.

They’ve been sorted by position, because the opportunity to create or make mistakes is largely driven by where you play. I’ve also compared the player’s importance to his side.

For example, Aaron Cresswell’s passes has contributed 17.5% of West Ham’s total positive change in non-shot xG and he has been responsible for 10.5% of the NS xG the Hammers have lost due to misplaced passes.

At the other end of the scale, Benteke’s passes has contributed 2.7% of Palace’s positive NS xG from passing, but he’s given away 10.3% of his side’s total generosity to their opponents.

I’ve included Rooney as a striker just to give him a suitable Premier League send off.

Wednesday, 2 May 2018

Non-Shot xG Models

This blog's been rather quite of late, mainly due to my writing over at Pinnacle, alongside working since 2016 as the Football Product Manager at Timeform, a analytics, content & data company.

So while the bulk of my output appears on these two sites, TPoG does give me the chance to prime some of the new stuff we've developed.

This week on the Infogol site, we revealed the work we've been doing to develop a non-shot xG model. The post can be read HERE

NSxG isn't a new concept, the idea's been around in other sports, such as the NFL for decades, but the fluid nature of football/soccer has made such models very data hungry & time consuming to run on a humble works computer.


I'll use this post to throw in some random thoughts about our NS xG and highlight the advantages and similarities to the more readily seen chance based xG models.

What's NS xG?

NSxG gives a value to every possession in every area of the playing field. It's most usefully expressed in expected goals and describes the likelihood that a possession will eventually turn into a goal.

If you've got the ball deep in your own half, the chance of that possession developing into a goal is tiny. If you've the ball in your opponent's penalty area, it's a lot more.

How can NSxG be Used?

In much the same way as shot based xG. namely to evaluate players and teams, but in the former case it's much more inclusive.

If you successfully move the ball from your own box to the opponents with one raking pass, you'll personally (along with the receiver) get the credit for the improvement in NSxG associated with the pass.

More realistically, if you competently move the ball ten yards upfield, you'll get a small uptick in NSxG. Do it consistently and you might even be ranked as the best at beginning deep lying moves in the Premier League.

What About Mistakes ?

There's risk and reward with every pass attempt. Unintentionally pass to the opposition instead of your deep lying playmaker and you're handing the opponents a fairly big chunk of NSxG, while giving up the small amount you owned prior to the pass.

So it can be used to Evaluate Defensive Actions? 

Yes, break up an attack with a tackle or interception and you can cost out the benefit by just summing the pre and post event NSxG for both teams.

What About Backward Passes that Find a Team Mate?

They'll lose NSxG, for the player making the pass, but they can be classified separately and might reveal the required role of the player or the tactical mode a side has slipped into, perhaps when defending a lead.

It's a harsh system that penalizes a player for taking the kick off.

Can It Only Be Used for Passes? 

No, it can be applied to any recorded action, running with the ball burns calories and gradually ticks up the change in NSxG (provided you're running in the right direction).

Who Benefits from an NSxG Model. 

Players who don't regularly provide a key pass or get onto the end of lots of chances. If you're the one breaking up the opposition's midfield passing or tasked with circulating the ball you've been bypassed by attacking event based expected goals.

NSxG shows everyone what you do

Can You Show That Players or Teams Over or Under Perform a NSxG Model?

Easily. Build your baseline model around the entire Premier League and you can estimate not only the worth of advancing the ball from A to B, but also how often an average Premier League side would expect to successfully achieve the pass or run.

Then you just see how often a particular team/player fares compared to the league average.

Is it Better than Normal xG? 

Not really better, just different. Usual xG does really well at rating teams, but less well at picking out individual contribution or mistakes.

If you've help craft a sublime move that goes the length of the pitch only for a team mate to fall over his or her own feet and lose the ball, you'd like some credit (& perhaps a black mark against your clumsy colleague, especially if he or she makes a habit of it).

Any Examples?

Here's the Liverpool 4 Manchester City 3 game from January broken down by the pass related NSxG for all the players.


There's a lot of numbers, so it's colour coded, blue is good, red is not so, although the jury is still out on the final column.

First numerical column is the cumulative increase in NSxG by each player's successful passes.

The Ox, Firmino and Mane showing up well. Gomez perhaps a surprise being so prominent? (I don't watch much Liverpool). Mo would show up more, I assume if we included the pass receiver as well, rather than just the passer.

De Bruyne unsurprisingly topping City's numbers, with Otamendi stepping up to help with the game chasing.

Next column is the NSxG "lost" by successful backward passes. Just ball re-circulation really.

Third column is the cumulative net gain through disrupting the opposition's passes. The Ox was definitely up for it that day.

Last column's a bit of a conundrum. It's NSxG lost by a player through misplaced or broken up passes.

You have to ask do you want to penalise your most talented players who try the most difficult passes, such as De Bruyne and the Ox (again).

If you don't have the red in column four, you may not have the blue in column one. Although they might ultimately harm the team by their extravagant pass choices.

It's all risk/reward and passing with purpose.

Here's a week later at the Liberty.

Liverpool losing 1-0 to Swansea.


30/70 possession in favour of Liverpool.

Liverpool's defenders stepping up to kick start many of their attacks. Lots of Liverpool passes going astray, but not particularly because of direct Swansea intervention. Ox putting in a similar performance, but Firmino struggling to find a teammate, but not for lack of trying.

Anyone shirking. Not really for me to say, substitutions included.

So Who's the Best Passing Team in the Premier League?

Manchester City.

Proof?

OK, definition of best passing side. One that makes valuable passes and completes them at well above the league average rates.

That's Manchester City.



Just a summary plot here.

We've combined the cumulative increase in NSxG with the under or over performance in the rate at which these passes are completed.

Manchester City's cumulative, successful passes increased their NSxG by 13% more than you would expect an average side to achieve if they were attempting the same passes Manchester City are inflicting on the opposition.

Huddersfield's successful passes increased their NSxG by 10% less than the average expectation if you had Mr Premier League Average doing your passing. Basically, they aren't very good at passing in areas where it matters more.


Tuesday, 27 February 2018

Hitting the Moving Promotion Target.

One inevitable question at this stage of the season is "what's our target to get automatic promotion/get in the playoffs/avoid relegation/get in the Champions League/finish above Arsenal".

The answer is problematical on quite a few levels, not least the phrasing of the initial question.

Does the questioner want a guaranteed outcome or just a target that makes the outcome more likely than not. The former can only be provided for those already leading the race, so a probabilistic reply seems the most suitable.

There's a couple of easy pitfalls to avoid.

For example if you're interested in the chances of a top six finish, the average points won by the sixth placed side isn't that useful. To finish 6th you simply have to narrowly eclipse the points and goal difference won by the 7th placed side.

And with a breakaway big six, such as in the Premier League, the difference between 6th and 7th can be huge.

But the problems don't stop there.

The target for a top 6th finish is most likely different for a side that isn't one of the established big six teams. One of the big six may have a slightly down season, but if you're an outsider looking to break into the top six, your target is likely to be higher than that of a founder member of the big 6.

Complicated.

Even at this late stage of the season, targets are set under the unique circumstances of this particular season, including the intertwined remaining fixture list played out by teams of varying underlying abilities.

The current points target at which Wolves becomes more likely than not to gain automatic promotion from the Championship will be different than Fulham's target.

An inferior Fulham team has to overhaul at least three teams currently ahead of them in the table, without being caught by opponents below them, over a fixture list that includes just one immediate rival.

In contrast, Wolves, the best team in the division, can allow one side to overhaul them, whilst playing out a fixture list that includes three (barely) realistic promotion rivals, giving the Old Gold the opportunity to reduce the points gathering potential of Villa, Cardiff and Derby...or the chasing trio the chance to cut into Wolves' lead.

In short, everyone's running their own unique race, with different challenges and different abilities.

Fulham could get promoted automatically with just 83 points, but in 89% of the occasions they reach exactly 83 points it is insufficient to win that prize.

If Wolves disappointingly win just 83 points, they still go up automatically in 66% of the occasions when they end with this final total.

Two identical final totals, but different probabilistic outcomes for the two sides.

If you want a Fulham points target where automatic promotion becomes more likely than not, it's currently 87 points.

As we've seen for Wolves their "breakeven" points tally is just 83 points and if you want virtual certainty of bringing Premier League football back to Molineux the target to aim for is 90.

Even better news for Wolves is that they get at least 83 points in 3999 out of every 4000 league simulations and at least 90 in 95% of trials.


Here's the rest of the "better than evens" targets for the main contenders for promotion or demotion in the Championship.

Sunday, 25 February 2018

Passive & Aggressive Defensive Teams

One of the major drawbacks in quoting counting statistics in football is the varied time of possession enjoyed by teams.

I first wrote about this nearly seven years ago here when describing Stoke's incredibly disciplined approach to defending once you factored in the inordinate amount of time they spent doing it under Tony Pulis in the early days of their soon to be ending Premier League jaunt.

Defensive statistics have always been blighted by failing to account for opportunity.

It is impossible for a Manchester City defender to accumulate the volume of defensive actions made by say a WBA defender, simply because the new champions are only out of possession for around 30% of a typical game and that game only has around 58 minutes when the ball is in play.

WBA, by contrast are averaging just 40% of the total possession and ceding ~60% to the opposition.

Before we can make any meaningful descriptive attempt at a side's defensive set up, we need to make some kind of attempt to account for the unequal range of possession for each team and the amount of time that the ball spends on the pitch rather than in the stands.

We can also attempt to define where on the field a side is trying to dispossess their opponents.

Some teams are noted for the desire to press opponents higher up the pitch to create a turnover or slow down a developing attack, whereas others are more content to lie deep and only actively engage an opponent once they venture into their final third. 


Vertical distance from your own goal can be slightly misleading. If you challenge an opponent on the centre spot you are slightly closer to your own goal than if the event occurs also on the halfway line, but on the touchline.

All calculations have been made from the point of the challenge to the centre of the defending sides own goalline.


 The table above using Infogol data has counted the number of defensive actions, such as tackles, interceptions and clearances made by each team after 27 games of the current Premier League campaign.


These have been grouped by distance from the event to the centre of that side's own goal. Finally, these event numbers have been standardised to account for the actual time each side has been without the ball and a figure for defensive actions per 10 minutes of opposition possession has been calculated.

For example, Manchester City appears to have by far the least number of active attempts to disrupt or disposes an opponent in 2017/18, only making around 16 such attempts per 10 minutes of opponent possession.

So they appear happy to allow teams to circulate the ball, but they do make their most concerted efforts to intercede between 20 and 40 yards from the City goal.

In contrast, Liverpool are much more aggressive at trying to regain the ball, making over twice as many defensive actions per 10 minutes than City, as well as  engaging opponents almost once a minute at distances of 50 or more yards from Liverpool's own goal.



The final sparkline plot shows, not only the total volume of defensive actions per 10 minutes of opponent possession, but also where a side is most active in engaging their opponent.

A side's own goal is on the left of the plot and volume of actions take place further away from a side's own goal as you move towards the extreme right of the sparkline.

The majority of  the top six teams peak their defensive actions between 30 and 40 yards from goal, whereas the remainder of the league by a majority either chose or are forced to defend between 10 and 20 yards from goal.

The most prominent example of a top six team residing in a relegation threatened defensive mindset is Manchester United.

Thursday, 1 February 2018

Manchester City and WBA. The Best in Top Tier History.

The importance of league tables is only absolute after the final game has been played and your side has secured that all important Europa League spot or finished 17th spot or higher.

For the remainder of the time, but particularly just after mid season, it is your side's position relative to their nearest challengers that is most important.

Watford's current 11th may give the illusion of relative safety, but on closer inspection they are only three points above Huddersfield, who are teetering on the brink of the relegation spots in 17th position.

One way to try to quantify your side's current position is to see how close, above or below a side is from the relative mediocrity of the average points won by all sides in the season to date, whilst also accounting for the distribution of points both currently after 25 games and in the past.

Manchester City can rightfully claim to be in the running to become the most dominant title winners in the history of the 20 team top tier.

They are currently 2.56 standard deviations above the current points average per team. Their nearest historical rivals were the Manchester United team of Beckham, Giggs, Keane, Sheringham and the Neville brothers from 2000/01, who were 2.51 SD's above par after 25 games and Chelsea's 2005/06 team (2.50 SD's).

At the bottom, WBA are the "best" 20th placed team ever, being only 1.06 SD's below the average points won by teams so far.

Likewise Swansea and Southampton are the most impressive 19th and 18th placed team, respectively after 25 games.

The unusually distributed nature of the points won by sides in 2017/18 then begins to catch up with those sides whose position implies relative safety, but the proximity of their rivals suggests otherwise.

Newcastle are the second worst 14th placed side in top tier history by this measure, as are Watford in 11th and Burnley in 7th.


Here's the rest of the teams. we've got the strongest bottom four ever in relation to the average points won by a side after 25 games, along with the weakest and most vulnerable mid table teams, again in top tier history.

Monday, 22 January 2018

After the Shot xG2

Expected goals has variously been defined by advocates and opponents respectively as a more accurate summary of what "should" have happened on the pitch or a useless appendage to the final scoreline, that is neither useful nor enlightening.

The first description is perhaps too overtly optimistic for a "work in progress" that is evolving into a useful tool for player projection and team prediction.

Whereas the second, less flattering description, may also stand up to some scrutiny, particularly if the supporters of the stat ignore the uncertainty intrinsic in it's calculation, while the detractors may be blithely ignorant of such limitations.

Both camps are genuinely attempting to quantify the true talent levels of players and teams in a format that allows for more insightful debate and, in the case of the nerds, one that is less prone to cognitive bias.

The strength of model based opinion is that it can examine processes that are necessary for success (or failure), drawing from a huge array of similar scenarios from past competitions.

And in doing so without straying too far down the route from chance creation to chance conversion (or not), so that the model avoids becoming too anchored in the specifics of the past, rendering any projections about the future flawed.

Overfitting past events is a model's version of eye test biases, but that shouldn't mean we throw out everything that happens, post chance creation for fear of producing an over confident model that sticks immutably to past events and fails to flexibly project the future.

It's no great stretch to model the various stages from final pass to the ball crossing the goal line (or not).

Invariably, the process of chance creation alone has been prioritised as a better predictor of future output and post shot modeling has remained either a neglected sidetrack or merely the niche basis for xG2 keeper shot stopping.

But if used in a less dogmatic way, mindful of the dangers of over fitting, the "full set" of hurdles that a decisive pass must overcome to create a goal (or not) may become a useful component in an integrated approach that utilises both numeric and visual clues to deciphering the beautiful game.

Lets look at chances and goals created from set pieces and corners.


Here's the output from two expected goals models for chances and on target attempts conceded by the current Premier League teams in the top flight since early 2014.

The xG column is a pre shot model, typically used to project a side's attacking or defensive process, that uses accumulated information, but is ignorant of what happened once contact with the ball was made.

The xG2 column is based entirely upon shots or headers that require a save and uses a variety of post shot information, such as placement, power, trajectory and deflections. Typically this model would be the basis for measuring a keeper's shot stopping abilities.

A superficial overview of the difference between the xG allowed from set pieces and actual goals allowed leads to the by now familiar "over or under performing" tag.

Stoke had been transformed into a spineless travesty of their former defensive core at set plays, conceding both chucks of xG and under performing wantonly by allowing 42 actual goals against 37 expected.

There's little disconnect between the Potters' xG2, that examines those attempts that needed a save, but the case of Spurs & Manchester United perhaps shows that deeper descriptive digging may provide more insight or at least add nuance.

Tottenham allowed a cumulative 29.6 xG conceding just 23.

We know from keeper models that Lloris is generally an excellent shot stopper and the xG2 model confirms that, along with the ever present randomness, the keeper's reactions are likely to have played a significant role in defending set play chances.

In allowing 23 goals, Lloris faced on target attempts that worth just over 31 goals to an average keeper.

29.6 xG goals are conceded, looked at in terms of xG2 this value has risen to 31.3, so still mindful of randomness, Spurs' defenders might have been a little below par in surpressing the xG2 attempts that came about from the xG chances they allowed, but Lloris performed outstandingly to reduce the level of actual goals to just 23.

Superficially, Manchester United appears identical.

As a side they allowed 37.6 xG, but just 32 actual goals. we know that De Gea is an excellent shot stopper, therefore in the absence of xG2 figures we might assume he performed a similar service for his defence as Lloris did for his.

However, United's xG2 is just 33.1 and the difference between this and the actual 32 goals allowed is positive, but relatively small compared to Lloris at Spurs.

By extending the range of modeling away from a simple over/under xG performance we can begin to examine credible explanations for the outputs we've arrived at.

Are United's defenders exerting so much pressure, even when allowing attempts consistent with an xG of 37.6 that the power. placement etc of those on targets efforts are diluted by the time they reach De Gea?

Are the attackers themselves under performing despite decent xG locations? (Every xG model is always a two way interaction between attackers and defenders).

Is it just randomness or is it a combination of all three?

Using under and over performing shorthand is fine. But we do have the data to delve more into the why and taking this xG and xG 2 data driven reasoning over to the video analysis side is the logical, integrated next step.

Monday, 15 January 2018

Arsenal Letting in Penalties Doesn't Defy the Odds.

Arsenal fans have been getting hot under the collar about penalties.

Penalty kicks have either been awarded (against Arsenal) when they shouldn't have been, not awarded (to Arsenal) when they should have or when they have been conceded, they've gone in, alot.

The latter has spawned the inevitable trivia titbit.


There's nothing wrong with such trivia as fuel for the banter engine between fans, but almost inevitably they quickly become evidence for an underlying problem that exclusively afflicts Arsenal.

Cue the Daily Mail "why is Arsenal's penalty saving record so poor"

So lets add some context.

We're into familiar selective cutoff territory, where you pick a starting point in a sequence to make a trend appear much more extreme than it actually is.

As you'd probably guess, Arsenal saved a penalty just prior to the start of the run.

They also saved one Premier League penalty in each of the preceding two seasons, two more per season if you go back two more campaigns and obligingly opponents penalty takers also missed the target completely on a handful of other occasions.

If you shun the exclusivity of the Premier League Arsenal keepers made penalty saves in FA Cup shootouts and induced two misses in Community Shield shootouts, the latter as recently as 2017.

Over the history of the Premier League, 14% of penalties have been saved by the keeper. The remaining have gone wide, hit the post, been scored or an attempt has been made to pass the ball to a team mate. (Arsenal, again)

Arsenal's overall Premier League penalty save rate is also 14%.

So you should ask if we're simply seeing a random streak that was likely to happen to someone, not necessarily Arsenal, over the course of Premier League history.

Arsenal has conceded nearly 100 Premier League penalties because they have  had dirty defenders  been ever present, respected members of the top flight.

Of the current Premier League sides, 17 have had the opportunity to concede a run of 23 consecutive penalty goals.

If we simulate all the penalties faced by each of these teams using a generic penalty success rates, you find that at least one side during the current history of the Premier league will have conceded a run of 23 penalty goals or more in just over half of the simulations.

Letting in penalty after penalty, sometimes up to and beyond 23 is something that is going to have happened slightly more often than not in the top flight, based on save rates.

Arsenal just happen to have had both the opportunity and the luck to have been the Premier League's slightly odds on reality star winner.

Friday, 5 January 2018

Making xG More Accessible

When the outputs of probabilistically modeled, expected goals met mainstream media it was very unlikely to have resulted in a soft landing.

With a few exceptions, notably Sean Ingle , Michael Cox and John Burn-Murdoch, the reaction to the higher media profile of expected goals has ranged from the misguided to the downright hostile and dismissive.

Jeff Stelling's pub worthy rant on Sky was entirely in keeping with how high the Soccer Saturday bar is set, (Stelling can't really think that, though. Can he?).

While the Telegraph's " expected goals went through the roof" critique of Arsenal's back foot point at home to Chelsea, wildly overstated the likelihood of each attempt ending up in the net.

Despite the understandable irritation, much of the blame for the negative reception for xG must lie with our own enclosed community, which created the monster in the first place.

Parading not one, but sometimes two decimal places is often enough to lose an entire audience of arithmophobic football fans, who would otherwise be receptive to the information that xG can be used to portray.

Presenting Chelsea as 3.18 xG "winners" against a 1.33 xG Arsenal team in a game that actually finished 2-2 is an equally clunky and far from intuitive way of presenting a more nuanced evaluation of the balance of scoring opportunities created by each side.

Quoting the raw xG inputs may be fine in peer groupings, such as the OptaPro Forum but if wider acceptance is craved for the concept of process verses outcome, a less number based approach must be sought.

When Paul Merson says that "Arsenal deserve to be in front" he's simply giving a valued opinion based on decades of watching and participating in top class football.

And, ironically when xG quotes Team A as having accumulated more xG than Team B in the first half of a match, it is similarly drawing upon a large, historical data pool of similar opportunities to quantify the balance of play, devoid of any cognitive bias or team allegiance.

Just as a detailed breakdown of Merson's neuron activity required to arrive at his conclusion would be both unnecessary and of very limited interest, merely quoting xG to a wider audience focuses entirely on the "clever" modelling, whilst completely ignoring any wider conclusion that could easily be expressed in football friendly terms.

I've been simulating the accumulated chance of a game being drawn or either team leading based on the individual xG of all goal attempts made up to the latest attempt, as a way of converting mere accumulated xG into a more palatable summary of a game.


 Here's the simulated attempt based xG timeline for Arsenal verses Chelsea.

It plots how likely it is that say Chelsea lead after 45 minutes given the xG of each team.

In this game, it's around a 50% chance that the attempts taken in the first half would have led to Chelsea scoring more goals than Arsenal.

It's around a 40% chance that the game is level (not necessarily scoreless) and around 10% that Arsenal lead.

So rather than quoting xG numbers to a largely unwilling audience, the game can be neatly summarised, from an xG perspective in a manner that isn't far removed from the eye test and partly subjective opinion of a watching ex professional.

"Chelsea leading is marginally the most likely current outcome, with Arsenal leading the least likely, based on goal attempts".

The value of xG is to accumulate process driven information to hopefully make projections that are solidly based, rather than reliant upon possibly poorly processed and inevitably biased, raw opinion based evaluations.

But that shouldn't mean we can't/won't use our data to present equally digestible, but number based opinion as to who's more likely to be leading in a single match....and express it in varying degrees of certainty, but in plainer English and without recourse to any decimal points.