Even once every two years football fans could see that England had allowed themselves to be totally dominated by their Italian opponents and the barrage of match stats that arrived along with shot after Italian shot only reinforced the idea that The Three Lions should have been well beaten in regulation.A 3:1 shot ratio of 27 to 9 had been pushed out to 4:1 by the end of extra time as Italy added 9 more shots without reply,yet the scoreline remained stubbornly at 0-0.
Luck intervenes in all sporting events and it's contribution can be keenly felt in single matches.On any given Sunday night,teams can bury one chance when marginally offside while blazing a similar opportunity high over the bar when remaining legal and vice versa.
So lets imagine that the first 93 minutes on Sunday is a reasonable representation of how an Italy England match under Roy Hodgson will pan out if the teams met over a limitless series of matches.What reward,on average would Italy gain from producing such an apparently dominating performance,especially if they kept repeating their superior shot ratio.
In this post I developed a goal expectancy for average shots based on the position on the pitch from which the shot was taken.In addition the model predicted how likely a shot was to hit the target requiring a save or was blocked by a defensive player.The data used wasn't extensive,comprising shots largely from the top six teams from the EPL and Stoke City.Therefore,the model was likely to be biased towards very efficient teams (despite their regular mid to lower level finishing positions,Stoke have maintained a ruthless level of scoring efficiency in their four season in the EPL,without which they would have almost certainly been relegated).In short the model is likely to represent high scoring teams (again Stoke would be high scoring if they had more than their paltry possession figures).
Games between highly ranked countries,especially in the knockout stages of a competition are very likely to be lower scoring than the environment from which our model originates.So it is probable that our conclusions will overestimate the average likelihood of goals in our simulated re runs of the Italy England spectacle.But we are only working in ball park figures anyway and a crude correction can be made at the end.
I've taken an screenshot from FourFourTwo's Euro2012 stats app initially showing England's nine shots from Sunday's game and I've labelled each shot.I've then entered the co ordinates of each shot into my regression model to calculate how likely it is that a typical shot from that co ordinate will score,miss the goal or be blocked near to source.From the outputs we can firstly get a feel for the quality of the chances.Were England attempting low percentage,longrange efforts or were they carving out clear cut efforts?
The game's best chance overall appears to have fallen to England's Glenn Johnson (shot 5) early on in the match.Not only did the chance fall centrally,giving the player the largest shooting area to aim for,but it was also just six yards out.In the sample comprising my model these type of chances,ranging from one on ones,crowded goal mouths to tap ins are converted 34% of the time,are more likely than not to be on target and are relatively difficult to block.So the most likely outcome was an on target shot that wasn't blocked,but was saved.And that's what happened.
Without having seen the actual match footage,Rooney's chance (shot 1) would appear to be more likely to have produced a goal based on the regression alone as it was taken from slightly closer to the goal.However,this effort excellently demonstrates the limitations of a data dump method of analysis that includes limited variables.Rooney had his back to goal,the ball was above and behind him and it required an overhead attempt.So the actual likelihood of a goal,in reality is far below the estimation derived from a two dimensional plot of a three dimensional fluid event.All models can deceive,as can shot maps.
Expectancy Values for Each of England's Shots Verses Italy. Euro 2012.
England Shot Number. | Probability of Shot Being On Target. | Probability of Shot Being Blocked. | Probability of Goal Being Scored. |
1 | 0.53 | 0.17 | 0.35 |
2 | 0.34 | 0.27 | 0.08 |
3 | 0.52 | 0.18 | 0.33 |
4 | 0.26 | 0.37 | 0.05 |
5 | 0.53 | 0.17 | 0.34 |
6 | 0.26 | 0.39 | 0.05 |
7 | 0.41 | 0.21 | 0.14 |
8 | 0.35 | 0.19 | 0.06 |
9 | 0.29 | 0.30 | 0.05 |
Cumulative Probability. | 3.5 | 2.3 | 1.4 |
Rooney's effort aside,the other eight England attempts were more typical of the type of chances that predominate in the dataset from which the model was made.If we add up the cumulative probabilities of all nine shot's chances of resulting in a goal,a block or a forced save we find that on average over many repetitions,England would have seen 3 shots on target,2 blocked and one goal,allowing for the non capture of the difficulty of shot one and the elevated goal environment in which the model was originally constructed.The reality of the night saw just one shot on target,3 blocked and zero goals.
Expectancy Values for Each of Italy's Shots Verses England. Euro 2012.
Italy Shot Number. | Probability of Shot Being on Target. | Probability of Shot Being Blocked. | Probability of Goal Being Scored. |
1 | 0.42 | 0.13 | 0.08 |
2 | 0.22 | 0.31 | 0.02 |
3 | 0.52 | 0.17 | 0.33 |
4 | 0.45 | 0.21 | 0.21 |
5 | 0.20 | 0.34 | 0.01 |
6 | 0.28 | 0.37 | 0.06 |
7 | 0.49 | 0.16 | 0.23 |
8 | 0.25 | 0.30 | 0.03 |
9 | 0.28 | 0.23 | 0.03 |
10 | 0.27 | 0.23 | 0.03 |
11 | 0.34 | 0.22 | 0.06 |
12 | 0.25 | 0.29 | 0.03 |
13 | 0.27 | 0.36 | 0.05 |
14 | 0.28 | 0.35 | 0.05 |
15 | 0.48 | 0.16 | 0.21 |
16 | 0.44 | 0.17 | 0.14 |
17 | 0.47 | 0.18 | 0.20 |
18 | 0.29 | 0.31 | 0.05 |
19 | 0.27 | 0.18 | 0.02 |
20 | 0.33 | 0.31 | 0.09 |
21 | 0.23 | 0.24 | 0.01 |
22 | 0.27 | 0.27 | 0.03 |
23 | 0.34 | 0.29 | 0.09 |
24 | 0.31 | 0.31 | 0.07 |
25 | 0.26 | 0.37 | 0.05 |
26 | 0.26 | 0.37 | 0.04 |
27 | 0.25 | 0.37 | 0.04 |
Cumulative Probability. | 8.7 | 7.2 | 2.3 |
By contrast with England,Italy positively peppered their opponents goal with shots,whilst also coming up empty.However,their superficially impressive numbers are bulked up with a large amount of long range,low expectancy attempts.Shot 3 was positionally on par with Glen Johnson's best effort and shots 4,7,15 and 17 were inferior,but should be classed as reasonable chances.In addition two very good chances followed one another after a Hart parry of a Balotelli shot and such efforts should be treated as a continuation of a single event,thereby slightly depressing the cumulative goal expectancy.The remaining efforts were mainly outsiders running for pride.
For those having trouble finding De Rossi's wonderfully struck sliced dipper that hit England's post after 3 minutes,it's shot 6.Unlikely to be on target,(technically it wasn't),quite likely to be blocked,it would have been a most unlikely opening goal,but a great one.
Having made similar adjustments in the case of Italy,their shot count on average would expect to see 8 shots on target (actual 6),7 blocked (actual 12,so well done England) and 2 goals (actual zero,well done Joe Hart).
Italy dominated the game,but England created three very good chances and generally shied away from the kind of speculative long distance efforts that characterised around three quarters of Italy's attempts.If we round down the most probable scoreline from the cumulative probabilities in lieu of asking both teams to continually reenact such an absorbing contest,the quality and quantity of last night's shots suggests that 2-1 to the Azzurri shouldn't have been a surprising result......assuming Italy didn't take the lead and then imitate the reactive approach of their opponents.
Hence the title of the post.....................
Great work Mark!
ReplyDeleteFor over a year now, I'm hoping to lay my hands on Eredivisie data to compute exactly this type of 'expected goals added' figure. I think this concept has a huge potential. It's easy to understand and it answers more questions than rough shot numbers do.
However, the debate surrounding this 'expected goals added' figure will probably be centered on the method behind it. My question is, did you take into account the specific match situation?
For example, a 10 yard shot straight in front of goal would stand a much better chance of resulting in a goal if this shot was the result of a counter attack than if it was produced from a corner...
Hi Sander,I've incorporated match score in more general analysis of shots,but I've only got really limited data that includes score and shot location.I've used the Guardian Chalkboards as a source,but they're no longer available except for the MLS.Plus data collection was a fairly time consuming process.
ReplyDeleteI agree quality and quantity of data is a major stumbling block to improved analysis.
Love your site btw.I'm a complete tactical novice !
Mark