The Power of Goals.: July 2012

Tuesday 24 July 2012

Soccernomics and Penalty Shootouts.

Just a quick followup to yesterday's post regarding penalty shootouts, especially as it has attracted a large number of views and quite a few emails.

To clarify, the claim that the team taking the first penalty automatically has a 60% chance of winning before a ball is kick is Soccernomics' claim, not mine. So can people please stop emailing to say it's crazy to suggest that the team shooting first has a 60% chance of winning......I know it is

Soccernomics have taken I believe 129 shootouts where the team kicking first has won 78 of the contests (60.4%) and proposed that the team opening the penalty shootout in future contests will enjoy this advantage.

I do not agree with this interpretation of their figures.

Imagine I toss a fair coin ten times and get 6 heads (60%). Could I claim that the coin is actually biased and heads will enjoy a 60% chance of appearing in future tosses? Intuitively the answer is no. I can try to confirm this intuition by either assuming that the coin is a fair one and calculate how likely I am to get an result as extreme or worse than 6 heads in 10 tosses with such a coin. The answer is 75% . That's a pretty big possibility that my 6 heads have arise by chance from a fair coin. Or I can carry on tossing to get more data.

All you can say about a series of trials is how likely or unlikely is the result if we assume the coin is fair or has varying degrees of bias. Even with 129 trials and 60% heads, we can't say that this is a 60% biased coin, we can only make claims based on likelihood and probability.

Below I've also taken option two, namely conduct more trials. I've recorded the results of just under 700 penalty shootouts, some of the data I'd previously collected, but I've updated it over the last day or so. It comprises all World Cup finals,all Euro Championships, the majority of FA Cup ties, Carling Cup ties, Asian cup matches, ACN games, playoff matches, European club ties and Johnson Paint games. Sources are many and varied, ranging from ESPN, to newspaper reports, wiki and fan uploaded Youtube video. I've read so many Brentford JP match reports, I'm halfway to becoming a fan! In many cases it is impossible to decide which team shot first, so many games have been omitted.

Success Rate of Teams Kicking First in Penalty Shootouts 1974-2012. Various Competitions.

Number Of Penalty Shootouts.	Number Won By Team Kicking First.	% Won By Team Kicking First.
689	358	52

There's a 1 in 3 chances that a fair 50:50 shootout will produce 358 winning contests for the team going first in 689 attempts. Statistical convention suggests that you can start to harbour doubts when you find results that have a 1 in 20 or smaller probability of occurring by chance in a fair device.

How ever you look at penalty shootouts, a claim for a 60% future success rate for first shooters isn't supported by the numbers.

On to other matters.................

Monday 23 July 2012

Shooting First Is No Advantage in a Penalty Shootout.

According to Soccernomics, the most dramatic ten seconds of a penalty shootout occurs when the BBC are back in the studio and ITV are on a commercial break. Namely the coin toss. 60% of the time, the team that takes the first penalty of a shootout goes onto win the contest and the authors have popped up on line throughout the later stages of the Champions League and during the Euros to reinforce this point. This nugget of advanced soccer analytics is regularly restated on blogs and websites and Prozone used an article on penalty shootouts to confirm " in accordance with the findings of Soccernomics, the apparent advantage of the team that takes the first penalty."

The claim is certainly an eye catching one. Before we can reach a shootout the game usually must finish stalemated after over two hours of playing time, indicating that both teams are closely matched and therefore it seems reasonable to assume that they possess similar penalty taking and saving skills. The premise stated in the Prozone article is that the pressure of constantly playing catch up adversely affects the team going second. This presupposes that professionals feel stress and that the second team in the shootout are always playing catch up.

They've used data from every World Cup and European Championship since 1998 as well as three key shootouts from this year's Carling Cup final and the Champions League Final and semi final and initially the evidence seems compelling. Two of the three key club shootouts were won by the team shooting first and 14 of the 18 World and European contests were similarly won. 16 out of 21 in total for a 76% success rate, certainly doesn't seem to be the kind of record you would expect from two equally matched "sides".

If we assume that each team actually had an equal chance of winning the shootout, it's a trivial matter to treat the 21 shootouts as a series of coin tosses and we can calculate how likely it is that a result as extreme as the one presented for the team shooting first by Prozone will occur. That chance is just under 2.7%. Statistical protocol suggests that with a result of this magnitude we can agree that there is some evidence to suggest that a penalty shootout is not a 50/50 proposition between those shooting first and those going second.

However, there are objections to this conclusion.The 1998 cut off is arbitrarily chosen as is the inclusion of the three key games, but we have much more data that we can draw on. The first shootout in the European Championship Finals was the Panenka final of 1976 and the first World Cup Finals one occurred in the 1982 semi final between West Germany and France. We have now extended the sample size to 37 World and Euro shootouts of which 20 were won by the first up side. Opening shooters still win more, but the success rate is now just 54%. The chances of the team shooting first and producing a 20 to 17 record if each shootout was actually a 50/50 proposition is now nearly 75%, so there's absolutely no reason to suspect we are seeing anything other than a fair contest.

So far we can initially conclude that the Prozone sample of 16 first shot winners from 21 trials gave some evidence that the competition wasn't fair, but that result wasn't statistically significant at 5% levels if we assumed that the first team to shoot had an inbuilt advantage of just 53%. Furthermore, in larger sample sizes containing all Euro and World Cup shootouts of which the Prozone sample was just a subset, statistical significance disappeared completely and the success rate for team's going first was highly likely to have been produced by two equally matched teams of penalty takers.

We are now left with Soccernomics' original 60% sample. The number of shootouts contained in their sample is, I believe 129, an impressive number, but still just a subset of all the penalty shootouts that have been played out. Therefore it is no more valid than Prozone's incomplete subset that supports first shooter supremacy or my more comprehensive sample that suggests we are seeing a fair contest. A 60% strike rate would suggest that on 77 or 78 occasions the contest was won by the team going first and as with the Prozone sample this gives us the statistically significant right to claim that the results are extreme enough to be inconsistent with a fair fight. But statistical significant disappears if we give the team going first an enhanced win probability of just 51.5%, a far cry from the claimed 60%.

Just as pertinently, the 129 game sample is probably only a quarter of all shootouts that have been taken in major leagues and competitions and as we have seen subsets can produce results that may disappear in larger ones.

The original claim is headline grabbing and the inclusion of plausible, subjective justifications for an advantage to exist make the idea appear compelling. It is just the kind of conclusion that advanced soccer statistical analysis should be hoping to discover......but unfortunately, it's not backed up by the numbers.

A follow up post can be read here.

Saturday 21 July 2012

Taking Your Shots At Home and Away.

It's essential to return to old topics once more substantial quantity and quality of data becomes available. Tempting though it may be to sign off a subject in any field as fully understood, it is rarely the case and today's conclusions should always be questioned or sometimes endorsed with new material.

Home advantage is is a widely accepted facet of professional sport and can readily be demonstrated in football. Individual teams may occasionally produce superior away results in comparison to their home form even over an entire season, but this can happen through random chance in large groups of trials and away advantage rarely persists. Factors such as crowd pressure, referee bias and familiarity with the surroundings are plausible reasons why home field advantage may exist, but little hard data exists to back up these theories.

On average, over time home teams outscore their visitors by a ratio of around 1.4:1 in the Premier League and that superiority filters down into the shooting conversion rates and shooting accuracy figures. Last season home sides had 3.5 more shots per game than their opponents and one more shot on target. This shooting disparity is likely to contribute to the size of the advantage, but reasons behind the existence of the imbalance has to remain speculation. Heightened aggression from the home team as discussed here may lead to a more attacking outlook from the hosts and this coupled with a deliberate offensive tactical approach may force the visitors onto the defensive. But whatever the reason, we are increasingly in a position to quantify the effectiveness of each team's shooting attempts and that enables use to being to peel away at some of the onfield interactions.

Average Shooting Numbers for Home & Away Sides in the EPL 2011/12.

	Shots per Game.	Shots on Target per Game.
Home Sides.	16	5
Away Sides.	12.5	4

Shot numbers add some detail, but by looking at the area of the pitch from where each shot originated and comparing expected conversion figures with actual ones we can begin to understand something about the onfield conditions under which home and away teams take their shots. We saw here that Stoke, who tactically approach most games as if they are the visitors, balance poor shot numbers by creating better than average chances and we can make similar calculations for all away teams.

Over multiple seasons, home sides out shoot away sides by a similar proportion as that seen in 2011/12. The average position of a home shot is a couple of yards inside the box and 7 yards wide of the centre of the goal, away teams are about a yard further out and a yard wider. So away sides are shooting from marginally less advantageous positions. A more useful way to demonstrate both the difficulty and scarcity of away shots compared to home ones is to plot frequency of shots sorted by distance from the goal.

A Shot Map For Various EPL Sides 2009/10-2011/12.

Every team that played in the EPL since 2009 are represented in the frequency plot and the profile of the home and away plots are very similar. The first peak comes about 4 yards in front of the penalty spot and the numerical supremacy of the home sides is well demonstrated. Once we move outside the penalty area the plots effectively merge. The peak at around 24 yards from goal can be easily explained by the number of direct free kicks, you can't have a free kick inside the 18 yard line and from much further out a shot quickly ceases to be a realistic option. Shot numbers outside the box are also very similar, indicating that away teams take a disproportionately larger proportion of longer ranged shots.

So initially we can confirm that away sides take less shots than home teams and we can add that these shots are on average attempted from further. This incremental improvement in our knowledge of venue specific shooting tendencies can be added to by further comparing the scoring and accuracy expectations of each shot with the actual outcome.

Expectancy Values verses Actual Values for Home and Away Shots in the EPL 2009-2012.

	Expected Goals per 100 Shots.	Expected Shots on Target per 100 Shots.	Expected Blocked Shots per 100 Shots.
Home Sides.	9.3	32	25
Actual Totals Home Sides.	8.7	30	27
Away Sides.	8.4	31	26
Actual Totals Away Sides.	9.1	34	24

Once shot position has been accounted for, home sides would expect to score almost nine and a half goals per 100 attempts and over a sample spanning nearly 30 different EPL teams and 3 of the most recent seasons and they fell short of that target by over half a goal. They were two shots per 100 shy of the expectation in terms of accuracy and two more shots per 100 were blocked compared to the model's prediction. So home sides underperfomed in all three areas. By contrast away teams overperformed the model in each of the three categories. Away sides were expected to score 8.4 goals per 100 judged on shot position (we saw they shoot from marginally poorer positions),but they managed just over nine.

So we can now pull the various threads together. Home sides outshoot their opponents and they create seemingly better chances. However, in reality over thousands of shots they under perform against a non venue specific model. Defensive strategy at the moment plays no part in the positional shot conversion model, so we can speculate that home shots are made more difficult be weight of defensive numbers in and around the shooting area. The reverse may be true of away teams, they perform better than expected with their more limited number of shots. Again we may speculate that home team commit more to attack, allowing more freedom to visiting attackers when they do get within range.

If visitors chose or are forced into playing a more defensive game, they appear to reap a slight payback through a better than expected overall conversion rate for their goal attempts. They score slightly more, see more shots on target and less blocked than predicted by a venue neutral model and with even more detailed data we may soon be able to tell if this approach is close to the optimum strategy to deal with a pumped up home foe.

Thursday 19 July 2012

Measuring The Measurers.

Only a few days to go before the eagerly awaited Olympic football kicks off in London and elsewhere, featuring hybrid versions of top international sides such as Spain, Uruguay, Brazil and Great Britain. Sandwiched as it is between the end of the Euros and the beginning of the new Premier League season, the competition may be a slightly hard sell to a footballing public that hasn't exactly been starved of top class action over recent weeks. Beckham's omission won't have helped to attract the casual fan tuning in and out of the Olympic feast, but a rump of under 23 talent combined with a maximum of three over aged players will ensure that every game holds some interest.

The real challenge though lies at the feet of the sports analysis firms who will try to bring order to this infrequent format by predicting the outcome the Olympic football games. Predicting the outcome of Premiership games is usually most difficult at the start of a campaign because of the pre season movement of players and management. Longterm statistical trends are tweaked in one direction or another dependent upon how favourably each team's squad strengthening has been received and usually a consensus opinion is reached. In the case of the Olympic competition solid information is limited, few of the teams playing the length and breadth of England and Wales will have a large body of results to draw on in formulating their relative chances and the bulk of some teams, such as Great Britain won't have played a competitive game together prior to the tournament kicking off.

It's therefore useful to have a method of evaluating the performance of a group of match selections and the forth coming event, shrouded in a mist of uncertainty is probably the most challenging of world football stages.

All football matches have three possible outcomes when played over 90 minutes, predicting whether a match will end in a "home" win and "away" win or a draw is the lifeblood of football prognostication. Literally hundreds of derived markets have evolved over the last decade or so, ranging from total goals, first scorer to half time result and which team will qualify from a knockout tie, but 1,2,X remains the standard from which football prediction is measured and the baseline from which most of the secondary markets are derived.

Many methods exist to test the measurers, and many are ad hoc and arbitrary. A points scoring system for every correct selection gives no extra credit for correctly predicting an unlikely outcome or for being particularly confident about a particular outcome. Systems based on expected proportions of wins and draws over a series of matches can see poor selections in one direction cancel out equally poor ones in another, leading to the illusion of accuracy.

To cut through the uncertainty we need to judge each model's estimation of a match ending as a home win, draw or away win on the merits of it's original confidence of each selection occurring once we are in possession of the actual outcome. One simple method involves taking the square root of the sum of the squares of the prediction and the actual outcome and the smaller the result, then the better the prediction.

Here's an example from the Euro 2012 Final using the opinions of two Irish bookmakers. Bookmakers estimations of match odds are always quoted with an extra edge of around 9 or 10%, so I've removed this overround to give a reasonable figure for the bookmakers estimation of the true chances of Spain, Italy or a stalemate resulting in 90 minutes.

Paddy Power Get Bested by Boyle Sports in the Final of Euro 2012.

Bookmaking Firm.	PP	BS	Match Outcome	Difference Squared for PP	Difference Squared for BS.
Probability of a Spain Win, 90 mins.	0.448	0.460	1	0.305	0.292
Probability of a Italy win, 90 mins.	0.296	0.298	0	0.088	0.089
Probability of a Draw, 90 mins.	0.256	0.242	0	0.066	0.059
Square Root of the Sum of the Squares.				0.676	0.662

Once Spain's 4-0 win in 90 minutes was confirmed, the likelihood of that result occurring collapses to 1 and the other two possible outcome become zero. BS were slightly more confident in a Spanish win than were PP, hence when the differences between prediction and reality are squared (to eliminate any confusion with minus signs), totalled and the square root taken, BS return a slightly lower figure. In this one match their estimation of what was likely to occur was marginally more accurate than that of their Irish rivals. Visually, BS were closer to the actual result, a Spanish win in regulation, but this method quantifies the difference and can be used over multiple games and multiple markets such as group qualification and pre tournament winners.

The example is from the world of betting, but increasingly predictive models are being developed for a wide range of footballing events and they require a consistent way to gauge their effectiveness. One of the more difficult games anyone has had to evaluate takes place on Friday, when a scratch GB side take on Brazil....in a friendly. Below I've listed the edge free odds of a series of firms and after the game I'll list who showed the best judgement.

How Firms View the Brazil v GB Olympic Warm Up Game.

Bookmaking Firm.	SB	BFr	SJ	888	BFa	WBX
Brazil Win.	0.513	0.532	0.496	0.537	0.541	0.548
Draw.	0.231	0.212	0.235	0.206	0.187	0.202
GB Win.	0.256	0.255	0.269	0.257	0.272	0.250
Square Root of the Sum of the Squares.	0.597	0.573	0.618	0.569	0.565	0.554

Possession Is Not 9/10ths of the Law.

"Remember, it's not about how long it is, but what you are able to do with it! Possession of the ball, that is..." quote from 11tegen11, July 19th 2012.
Probably the most intelligent statement you are ever likely to read concerning the use of possession statistics in football.

Possession stats tell you how long a team spent doing "things".

That's it.

They don't tell you what those things were. Teams could have been trying to score, they could have been trying to prevent the opposition from scoring by keeping the ball. Polar opposite objectives. Sides could have been comfortably ahead or they could have been comfortably behind. It doesn't tell you how good the players were at carrying out their various tasks.

Tactical Approach.
Ability to Carry out the Game Plan.
Game Context.

None of these factors appear in a catch all possession stat and each of them drives it.

Percentage of Possession should be largely ignored.

Wednesday 18 July 2012

Getting Out Shot Isn't Always Bad News.

With the influx of readily available data, shooting efficiency has quickly become the mainstay of much of the statistical analysis of both football teams and individual players. The rate at which a keeper keeps out shots, the ratio of goals to shots for strikers and team conversion rates are now a common and welcome addition for debate. While any advance in the quantity of data is to be welcomed, this all encompassing approach does have it's drawbacks. In judging a keeper or a striker simply on his shooting or saving success ratio we are hoping that sheer weight of trials are leveling out the quality of the attempts. The accuracy of our conclusions depend on keepers facing roughly the same overall quality of attempts or players or teams being presented with chances of roughly the same difficulty. An improvement compared to the previous goalcentric models perhaps, but still a not inconsiderable leap of faith.

Euro 2012 well demonstrated that there are numerous ways to play football and teams as diverse as Stoke and Swansea can achieve virtually identical seasonal points tallies in the Premier League by employing polar opposite approaches. By using raw counting methods and applying it to such statistics as team shots, we may capture the general trend, but we are destined to miss almost all of the finer detail that may differentiate between the Midland's best EPL team of recent times and their Welsh cousin.

Therefore a parallel approach is needed using more granular data or we risk dismissing alternative playing styles as mere outliers or the product of excessive luck. Everyone is comfortable heaping praise on the all conquering Spanish club or country sides, who are perceived as maximizing their talent with a style and philosophy that is both successful and pleasing to the eye. But we still need to try to explain why less attractive teams appear to prosper, especially if their success appears to contradict the new statistics that are currently used to define desirable team talent.

The first step to improving the information gained from shot data is to be able to attribute a expectancy for each shot or header based on the origin of the attempt. How likely is a shot to result in a save, a miss, a block or a goal. This tentative first step is available from Opta supported apps, such as FourFourTwo's Stats Zone and while data collection is time consuming, the effort is rewarding.

The continued comfortably successful presence of Stoke in the EPL midtable and within the various domestic and European cup competitions regularly provokes reactions ranging from mild bemusement to outright hostility. Especially as they are entering their fifth consecutive top flight campaign while appearing to perform poorly in many of the shot based advanced metrics and possession stats.I'm extremely skeptical about the usefulness of possession data ,but for now I'll use Stoke to illustrate how shot data can reveal hiddens gems about a team's record, but only if we look beyond the general counting stats that are the mainstay.

I've taken every goal attempt propelled and faced by Stoke during the 2010/11 season, their third year back in the top flight. The declining influence Delap's longthrow has been charted here and the goal scoring duties for the campaign lay at the feet and head of £8 million Kenwyne Jones, the largely unheralded £3 million Jon Walters from Championship side, Ipswich, Tuncay, a £5 million flop who would leave in the January window and an ageing and increasingly injury prone Ricardo Fuller. So the striking talent was broadly consistent with that available to EPL teams from midtable and below.

How Many Goals A Slightly Above Average Side Would Expect to Score Given Stoke's Opportunities in 2010/11.

Stoke Attack (480)	Goals.	Shots on Target.	Shots Blocked.
Expectancy.	69	183	117
Actual Performance.	45	135	128
Expectancy per Shot.	0.15	0.38	0.24

My shot conversion model at the moment contains an over representation of good attacking sides, but we are simply using it here to see how Stoke managed to overcome their unbalanced ratio of shots allowed and shots attempted.

Stoke's tactical approach under Pulis has always been one of containment regardless of venue, so the lack of possession is baked into the gameplan. They were outshot in 2010/11 by almost 60 attempts over the season as a whole, but finished 13th with a goal difference of minus two, an extra point away from 10th. The point of origin of each Stoke attempt has been charted and a goal expectancy, accuracy expectation and likelihood of seeing the shot blocked is calculated for each shot. Cumulatively we can then see how many goals an above average attacking team might have scored had they been able to create Stoke's 480 chances from 2010/11.

How Many Goals A Slightly Above Average Side Would Expect to Concede Given Stoke's Opponents Opportunities in 2010/11.

Stoke Defence (539)	Goals.	Shots on Target.	Shots Blocked.
Expectancy.	59	186	142
Actual Performance.	46	182	156
Expectancy per Shot.	0.11	0.35	0.26

The predicted outcomes of the shots Stoke's defence allowed are then similarly treated and by comparing the two batches off shots we can try to see how Stoke managed to score as many as they conceded despite the large excess of shots aimed at their goal by frustrated opponents.

The most significant figure is the expectancy per shot in each table. When Stoke were shooting it averaged 0.15 per attempt, when they were on the receiving end the figure fell to 0.11. So Stoke were creating less shooting or heading opportunities and allowing more, but this was compensated by their shots on average being more likely to result in goals. They were also more likely to be on target, creating rebound possibilities and less likely to be blocked judged on where on the pitch the attempts were taken from.

That Stoke's strikeforce didn't manage to match the goal tally from their 480 attempts that was predicted by a model that contains data from the likes of the top five EPL teams shouldn't be surprising. But Stoke scored 45 goals, omitting own goals in 2010/11 by presenting relatively low numbers of high quality chances to predominantly average or below Premiership strikers. Leading to a average season long total in the mid forties. As a group Stoke players who had a goal attempt two years ago weren't that potent, certainly weren't very accurate and had more shots blocked than the top teams......but they were presented with top drawer chances.

Kenwyne..."...set them up & I'll put some of them away".

An EPL side consists of at least two units and the defence, by contrast overperforms the model. Even though they had an average of two more shots a game to contend with compared to their own team's offensive output, they only conceded 46 goals. Chances created by the opposition were more difficult opportunities, as shown by an expectancy per shot of 0.11 goals compared to 0.15 goals for the Stoke attack. Given Stoke's setup, that was probably a team effort, starting with Jon Walters and working back down the field. Opponents also couldn't match the model's expectation and under shot by 13 goals and that was probably more down to the back four and keeper. Stoke's defensive legion also made many more blocks than predicted, another constant from team's coached by Tony Pulis.

So in this case the more numerous shots allowed were of a lower chance quality and intervention by Stoke defenders, which naturally is currently beyond the scope of one man analysis, further reduced the scoring levels.Stoke's 480:539 shot ratio appears poor, but the raw numbers obscure the hard work of the defence and a gameplan designed to produce quality chances for a team then lacking in attacking excellence.

Sunday 15 July 2012

Who Were Arsenal's Two Footed Marksmen ?

Picking the ball up wide left, your new signing races off down the line, rapidly slowing as he's engaged by a covering defender before a stutter step sees him cut inside his opponent giving him an unimpeded route into the box. The keeper meets him ten yards from goal, but with the added protection of a narrow shooting angle. Deliberately, your player opens up his body shape, before curling an exquisite right footer chip wide of the keeper's outspread arms and just inside the angle of post and bar. Everyone's delighted, the manager, the players and most importantly, the fans. And then it hits you....the expensive new signing can't kick with his left foot.

Favouring one foot over another is a well known facet of football, anyone who's played football at any level will well recognize that it feels more natural and easier to kick with one foot, usually the right, than the other. It's also well documented that continued practice does enhance your ability to perform a task. Therefore you would expect professional players to be able to overcome the seemingly natural preference for one foot over the other and reduce the ability gap between feet through constant repetition.

Many players are listed as either left or right footed, tacitly acknowledging that an ability defined preference exists. So it would be an interesting exercise to see to what extent players default to their natural side when attempting to perform actions that require skill and precision and how and by how much their competence falls away if they attempt a similar skill with their weaker foot.

The more extreme the spread of a particular talent within a group, the more pronounced the difference is likely to be between preferred and shunned feet in individual players and so shooting ability is a prime candidate to test. Martin Keown, in his newspaper column late last year highlighted the improvement he has seen in Robin van Persie's right foot. From arriving at Arsenal in 2004, in Keown's opinion the Dutchman had progressed from being almost exclusively left footed to become a genuine two footed goal scoring threat.

Preference should be very easy to demonstrate. While some split second chances will require a player to hit the shot with the foot it falls on, more often a player has time to work the ball onto his favourite side and dead ball attempts are almost universally taken with the "good" foot.

A Rare Pedidextrous Sportsman.

Five Arsenal players took a substantial number of shots at goal in 2010/11 and while it is entirely possible that none of them will still be at the Emirates come kickoff 2012/13, they provide a reasonable large sample to test for "footedness". Nasri, Fabregas, Arshavin, Walcott and van Persie comprise the group and together they attempted over 350 shots at goal and just shy of 80% of those were made with their designated "good" foot. Each of the five players shot more with their natural foot and indeed if Rosicky and Wilshere were added to the mix the lopsided nature of the splits would have been even more pronounced. Rosicky had 24 attempts on goal, of which 23 were with his right foot. His sole left footed effort was blocked at Molineux. Wilshere was even more extreme with 25 left footed shots compared to another sole, blocked right foot one.

The element of choice makes these findings unsurprising, although the apparent fanaticism of Wilshere and Rosicky for one foot may be surprising. Injury has unfortunately meant that the former hasn't kicked a ball in anger at all this season and Rosicky merely equaled his 2010/11 left footed tally in 2011/12. Of more interest is how effective were each of the five Arsenal stars who were prepared to shoot with either foot.

Once again I've used this model to estimate how likely a shot is to result in a block, a goal or a shot on target. If we compare the expectancies with reality, while accepting all the usual caveats concerning sample size and model accuracy, and split the sample by "good" and "weak" feet we can perhaps see how much drop off in production occurs between the two.

The Two Footed Robin van Persie, 2010/11.

v Persie,Left Foot (69attempts)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	6.6	23.8	16.8
Actual Number	12	30	10

v Persie,Right Foot (13 attempts)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	1.9	5.4	2.8
Actual Number.	5	8	1

There certainly appears to be tentative evidence that supports Martin Keown's visual opinion that van Persie is comfortable with either feet in his later Arsenal career. It's easy to be seduced by one outstanding goal with a player's weaker foot into believing that it represents a more general improvement, but Keown's judgement appears sound. Van Persie has an understandable preference to shoot with his natural left foot and he is well ahead of the model's prediction for an average shooter. His shots produce more goals with either feet than an average expectation once shot origin is considered, they are also more accurate and less are blocked. That van Persie is a better than average finisher is a given, but he maintains his superiority regardless of the foot propelling the ball.

The Not So Two Footed Andrei Arshavin, 2010/11.

Arshavin,Right Foot (49 attempts)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	3.6	15.7	12.4
Actual Number	5	19	8

Arshavin,Left Foot (21 attempts)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	2.1	7.7	4.6
Actual Number.	1	7	5

The evidence is less compelling in the case of Arshavin. He's above average in terms of goals, shots on target and evading blocked efforts with his favourite right, but below the line in all three categories with his left. These more ambiguous results continue as we move through Arsenal's midfield from 2010/11. Walcott posts van Persie like figures with his right boot, but is only just above average in accuracy and output with a left boot that positively attracts blocks from defenders. Nasri partly mirrors Walcott's performance and Fabregas was a slightly below average shooter with his preferred right and even more disappointing with his weaker left over his last season with the Gunners.

Individual player numbers hold more headline grabbing interest than generalisations over a larger group, although the former are often less reliable and the latter prove to be more predictive. Therefore, it seems sensible to combine the data from our group of numerically prolific marksmen in an attempt to gauge the size of the difference between feet.

Arshavin, Walcott, Nasri, Fabregas and van Persie Efforts Combined & Sorted By Preferred Foot.

Arsenal 5,Preferred Foot.(291)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	23.3	96.7	74.8
Actual Number	34	112	66

Arsenal 5,Weaker Foot.(76)	Goals.	Shot on Target.	Shot Blocked.
Expectancy From Model.	8.2	28	17.4
Actual Number.	11	34	21

We've already seen that there exists an 80:20 split in shot numbers and shots by the group's stronger feet outperform average predicted scoring rates by over 45%. This shouldn't surprise, at the moment to maximize numbers, the model comprises shots by strikers, midfielders and defenders and we are dealing with Arsenal's elite scorers here. Shots with the weaker foot only outperform the goal model by 35%.

Accuracy is similar in both subsets, but significantly less shots are blocked when players shoot with their natural foot compared to their unnatural side.Speculatively, this may because "wrong" foot shots lack power, giving blockers more time to intervene. Certainly, there is a reluctance to shot from extreme distance with this group's weaker foot, perhaps an admission that such attempts are likely to be fruitless. Shooting distance averages 16 yards in this weaker category, but extends to over 20 when attempts are made with the stronger foot.

Overall, the limited data would appear to confirm intuition. Players have a weaker foot and it should be possible in time to estimate the size of this deficiency for different player positions, given more extensive data. The individual splits suggest that some players are able to improve their weaker side and partly bridge the performance gap, while maintaining a preference. This may make them more unpredictable when faced with a shooting opportunity and consequently more dangerous.

In his Arsenal career so far, Tomas Rosicky virtually never shoots with his left boot, so only he knows how good or bad his unfavoured side is. Maybe he should be prepared to use his left more frequently before his predictability becomes common knowledge in the EPL.

Saturday 14 July 2012

How Wenger Rang The Changes in 2010/11.

Once the starting eleven walk on to the pitch at kickoff time, the manger has already played a large part of his hand. "In game" tactical rearrangements and of course substitutions are his only recourse if the game situation turns against him or if he wants to consolidate a strong position. We saw here and here how the course of a season's worth of games were altered by the introduction of substitutes for teams such as Chelsea and Stoke. By mapping the expected points situation for sides pre and post their phase of substitutions, we gained a broad picture of how successful (or lucky) AVB and Tony Pulis had been with their tinkering.

Of course the introduction of one or a group of replacements being immediately followed by an upturn in game fortune doesn't necessarily mean that all of the credit should be immediately heaped on the substitutes. Nor should they shoulder all the blame should things go from bad to even worse. Credit or debit should be shared between the 90 minute players as well as the manager for deciding to make adjustments. So in this follow up post, I've tried to isolate some of the individual attacking contributions made by the Arsenal substitutions employed by Wenger during the 2010/11 league season.

Much of the easily available data can be used to give a general picture of Wenger's approach to substitutions, but it's probably best not to read too much into these numbers because they may be determined by a multitude of different and competing factors. Injury replacements are obviously enforced, but game situation, depth of squad and upcoming games may also partly determine gameday rotations. Unused bench duty is also a favoured punishment for player misdemeanors by at least one EPL boss.

Instead I'll use Wenger's 2010/11 substitution tendencies merely to set the scene. 107 players were substituted into Arsenal's 2010/11 EPL season, so that's an average of just over 2.8 a game. 61% were midfielders, 33% strikers and only 6% defenders. Game position didn't really seem to change Wenger's average response, he was as likely to sub in attackers or midfielders to protect a lead as he was to chase a deficit. The average time of a substitution was just past the 70th minute and the most active 10 minute segment was from the 80th to the 89th minute. 6.5% of the subbed players received the bad news whilst eating their half time orange. Bendtner (14), Rosicky (13) and Arshavin (12) were the most frequently called upon replacements.

70th Minute.......All Change.

Goal tally is the most recognised measure of individual offensive production, but it's also a very crude one especially over limited sample sizes. Goal attempts allows a slightly more precise estimation of a player's contribution to his sides's attacking intentions and this approach can be bettered still if we can factor in the quality of chances a substitute is producing in combination with his teammates.

Nineteen different substitutes of which 16 were either midfielders or strikers managed only three goals for Wenger in 2010/11, one each from Vela, Arshavin and Walcott. However, if we look instead at the goal attempts of the group, we now have 66 events to analyse.

Goal Attempts of Arsenal Subs, 2010/11, Sorted as Strikers, Midfielders and Defenders.

Player (mins played as sub) 2010/11.	Goal Attempts.	Cumulative Goal Expectancy.	Cumulative Shot Accuracy Expectancy.	Actual No. of Goals.	Actual No. of Shots on Target.
Bendtner (281)	8	1.1	3.2	0	2
Chamakh (350)	8	1.3	3.2	0	2
Denilson (217)	6	0.3	1.7	0	1
E-Thomas (13)	0	0	0	0	0
van Persie (173)	2	0.1	0.4	0	1
Vela (48)	3	0.3	1.0	1	0
A Diaby (70)	3	0.4	1.2	0	0
Arshavin (347)	7	0.5	2.3	1	2
Eboue (70)	0	0	0	0	0
Fabregas (98)	3	0.2	0.8	0	0
Nasri (16)	2	0.1	0.6	0	0
Ramsey (15)	0	0	0	0	0
Rosicky (341)	12	1.2	4.0	0	4
Song (13)	1	0.3	0.5	0	0
Walcott (198)	9	0.5	2.8	1	3
Wilshere (90)	2	0.2	0.7	0	1
Djourou (66)	0	0	0	0	0
Gibbs (22)	0	0	0	0	0
Squillaci (70)	0	0	0	0	0

At once we see how little information is conveyed by a "goals only" approach. Even simply adding the number of shots on target and total goal attempts begins to enhance the picture of the impact made by each substitute. If we record the pitch co ordinates from where each individual shot originated from we can compare the figures to the goal expectancy model described here and calculate a goal expectancy and accuracy expectancy for each effort. Cumulative totals for each individual player's attempts can begin to be used to estimate a longterm average expectation should the player's 2010/11 shooting exploits prove typical of their future performance.

Naturally goal attempts are very limited for individual players. It would be extremely optimistic to expect Song's eyecatchingly high goal expectancy per minute to continue in larger samples, based as it is on just 13 minutes of substitute action, during which he conjured up one shot with a comparatively high chance of success. Similarly, Vela out performed his goal expectancy with one goal from three shots that had a cumulative predicted expectancy of well below half a goal. His figures would almost certainly regress with increased attempts.

The most prolific individual shooters, namely Rosicky, Chamakh and Bendtner are seeing their shot accuracy figures trending towards their predicted numbers, but their goal totals are stuck firmly on zero and the figures above do illustrate the problems involved when analysis is done on relatively small sub samples. We can add definition to opinion by summing goal expectancies from a dozen or so shots, but trying to derive certainty from only a handful of individual efforts is ultimately self defeating. To be able to talk with more authority we need to pool the substitutes performances together and compare their performance as a group to that of Arsenal's starters over roughly the same time scale.

The Offensive Output of Arsenal's Starters over the Last 20 Minutes compared to that of their Subs.

	Cumulative Goal Expectancy for All Shots.	Actual Number of Goals.
Shooting Record of Arsenal Subs. 2010/11	6.3	3
Shooting Record of Arsenal Starters 70th Min Onwards. 2010/11.	18.3	18

Arsenal's subs underperfom their goal expectancy from their 66 attempts by nearly half despite their fresh legs, while the starters perform very close to their expectation over a roughly comparable portion of the match, namely the last 20 minutes when subs are most likely to also be active. If this kind of split performance repeats itself over larger Arsenal populations and is seen in samples of other teams, it may provide evidence that the attacking output from substitutes fails to match those of the starters who play through the entire game. They may be subs for a reason, or it may a quirk of this particular set of samples.

Goal expectancy and accuracy estimates are just the start in assessing individual player contribution, the quality of assists is the logical next step. This post should be looked on as a methodology to begin to sort the good player from the not so good and habitual substitutes would logically fall into the latter category. The results of this preliminary trial would suggest that Wenger well knew where his goal generating talent lay in 2010/11.

Data for this post was variously gathered from chalkboards, 442's EPL ipad app and an especial thanks to OptaPro.

Tuesday 10 July 2012

Spotting Genius is Easy, But what about The Rest.

Nearly forty years on and it is still a vivid memory. The Blue shirted Swedish defender appears to be in control of the situation. Notwithstanding the instant control, the attacker seems to have few options, he's got the ball wide on his team's left flank, about ten yards from his opponents goalline, but he's facing back down the touchline towards his own half of the pitch. His only real choice is to lay the ball back to a supporting colleague or at best lift a hopeful inswinging cross with his right foot into the congested area. And for a split second he seems to have chosen the latter. The defender sticks out a half hearted left leg to block the expected response. He's induced the cross from the attacker and from that angle and with no real intended target it's almost certain to be cleared or claimed by his keeper.

But then it happens. Instead of the cross, the attacker deftly caresses the ball back through his own legs with his right instep. Instantly he pivots on his standing, left leg and sprints after his own pass towards the now undefended byeline, showing only his black number 14 on his Orange shirt and a blur of attacking intent to his bemused marker.

Ladies and gentlemen, the Cruyff Turn.

Genius.

A moment of supreme skill as well as theatre and recognisable as such to anyone who saw it. Unfortunately, it's not always as easy to separate the skill from the mere mundane in football as the talent shows itself in minute improvements in efficiency of passing, quickness of brain or variation in power or placement.

Increasingly analysis is turning to rate statistics to attempt the classify a pecking order for talent based footballing actions. Every transfer target now comes with his numbers attached, shot conversion rate, cross conversion rate, pass conversion rate, the list goes on. So it's vitally important that we have some level of confidence in these type of figures.

How, why and even if the quoted statistics are causatively correlated to match outcome would appear to be the most obvious course of the initial investigation and that should be followed by how much strength we should attach to players who have recorded impressive or not so impressive records from limited sample sizes. However, there is one stage we need to evaluate before these processes are even applied to the raw numbers.

The most fundamental question that is rarely asked is "Is the factor we are measuring even a skill"? or are we simply seeing random fluctuations in performance rates that are entirely down to chance. Once we have confirmed our intuition and found evidence that we are indeed looking at a skill, we next need to know how much of a skill it is. Only then can we start to begin to know how many observations we need to make of a group of players before that skill begins to shine through.

Solving the "is it a skill and how much of a skill is it" ? problem would seem straightforward. The natural starting point is the player's raw conversion rate. So imagine you've identified an attribute that you've seen on a pitch and you want to purchase a player who displays that attribute. I don't want to confuse things by picking an obscure skill that may or may not be almost entirely luck based, so we'll just call it "The Attribute".

The success rate at completing our particular attribute in a group of similar players averages exactly 10%. Let's imagine you've got a list of 50 prospective purchases and their success rate over their last 100 occasions of attempting to perform your team's desired footballing task. Below I've listed the top 5 and bottom 5 performers.

How many Success Were Recorded by Player's Performing "The Attribute".

Player.	Number of Successes in 100 Attempts.	Success Rate.	Price ?
1st Ranked.	17	17%
2nd Ranked.	16	16%
3rd Ranked.	16	16%
4th Ranked.	15	15%
5th Ranked.	15	15%

46th Ranked.	6	6%
47th Ranked.	6	6%
48th Ranked.	5	5%
49th Ranked.	5	5%
50th Ranked.	3	3%

So who do you buy? If you're Manchester City or Chelsea and money is no problem, you buy one of the top five, probably driving up the price to unrealistically high level. If you're Stoke you pick up number 48 on a free transfer, possibly with a history of off field problems.

The catch is that the group from which the above table is just a part, was actually generated randomly. I set the success rate to 10% and over 100 different observations for 50 different players, this is the range of outcomes that appear solely through chance. If you broke the bank to buy the top ranked player, then you're out of luck because he doesn't have any skill when it comes to excelling at the attribute, because no one has. The table makes it appear that skill is involved because that is the kind of distribution of successes and failures that people expect to see if skill is a factor. Player 1's 17 successes came about entirely by chance, as did Player 50's mere three successes.

Fortunately, it's possible to take a mathematical look at the spread of the distribution of successes and failures for a group of players and be able to tell if it has likely arisen through chance alone or if it has been skewed by external factors that could be attributed to a varying level of skills within the group. Applying these methods to the tabulated numbers above confirms that I generated the distribution randomly.

However, if I use the same methods to look at such attributes as a players ability to create chances that are converted in open play, you find that the distribution of converted chances that are turned into goals doesn't resemble a random distribution. Another factor, such as a combination of skills spread between the provider and the scorer is present. If we further look at the ability of players to provide clear cut chances for their team mates, we find that not only does that appear to be a talent, but it is a much rarer talent than any other that I have currently looked at.

This was intended as an excuse to post a photo of my Cruyff shirt and discuss chance creation. Instead it's turned into a post on the steps needed to validate, analyse and produce reliable and credible static player rate statistics that are beginning to flood the blogosphere. Knowing the spread of talent within a group of players is vital to giving you an idea about the reliability of your conclusions over a limited number of observations. In short, you need to know if you're likely buying a lucky or a talented player and how much of your desired skill is down to player talent.

I'll deal with chance creation later, but at least I kept in the Cruyff reference.

Sunday 8 July 2012

.............and Statistics.What Park Ji-Sung's 15 Yard Shot Conversion Rate Really Tells Us.

The world of football was stunned yesterday, when word leaked out of QPR's audacious swoop for Manchester United's Park Ji-Sung. OK I'm exaggerating somewhat, but it was certainly a news worthy event and the transfer briefly trended on Twitter. One tweet caught my eye when ESPN declared that "the biggest drop off was his finishing" and backed the view up with his stats in graphical form from 2010/11 and 2011/12. Five goals from seven non blocked shots in the former season compared to one goal from five in the latter. So compelling evidence that QPR had bought a player whose finishing had well and truely "dropped off" over the previous two years?

Park Ji-Sung.

The counter argument quite naturally should start with the fact that headline conclusions are being made on the evidence of very small sample sizes. Seven attempts in 2010/11 and five in the following season. Sample's of that size are going to see conversion rates bounce around considerably just through random chance. In writing the previous posts that looked at randomness in footballing rate statistics I found that the proportions of chance and skill converged after around 30 shots for individuals. So a player could produce the kind of numbers in single figure attempts that Park Ji-Sung posted through a lot of luck and no drop of in true talent or shot converting ability between the two seasons. His high strike rate in 2010/11 on such a small sample size is almost certain to regress towards the mean in future years.

However, there's also one statistical finesse that appears to have been used here. Park's conversion rate has been restricted to shots from 15 yards and has excluded efforts that were blocked. The latter proviso appears to be down to an individual data supplier's particular policy, so we'll ignore that for the moment. But the choice of a 15 yard cut off point for shots is curious.

Why not 18 yards or 12, there's an 18 yard line and a 12 yard penalty spot after all. Or why not use all of Park's goal attempts. 17 in 2010/11 and 12 in 2011/12.

Park's Shot Record 2010/11 Sorted By Distance.(Red Shots <15 yards,Old Gold >15 yards).

Opponent.	Shot Outcome.	Probability of Shot Being On Target.	Probability of Shot Being Blocked.	Probability of a Goal.
Wolves	On Target	0.48	0.12	0.15
Blackpool	Goal	0.51	0.16	0.29
Blackburn	Goal	0.50	0.18	0.30
Arsenal	Goal	0.47	0.20	0.24
Blackburn	Blocked	0.46	0.19	0.22
Wolves	Blocked	0.43	0.21	0.16
Wolves	Goal	0.43	0.22	0.18
Wolves	Goal	0.41	0.21	0.13
Tottenham	Off Target	0.35	0.18	0.06
Fulham	Blocked	0.38	0.21	0.10
Fulham	On Target	0.41	0.25	0.17
Arsenal	Blocked	0.32	0.22	0.05
Wolves	Off Target	0.27	0.20	0.02
Chelsea	On Target	0.22	0.21	0.01
Man City	Blocked	0.26	0.26	0.03
Tottenham	Off Target	0.26	0.31	0.04
Wolves	Blocked	0.21	0.28	0.01

Cumulative Probability.

6.6
Compared to 6 Actual Shots on Target

3.5
Compared to 6 Actual Blocks.

2.2
Compared to 5 Actual Goals.

Above are all of Park's shots from 2010/11, including all of his blocked efforts, which should be ignored because they don't seem to have formed part of the ESPN tweet. They have been sorted by distance, with the closest shots listed first. The shots against teams marked in red are from within 15 yards and they appear to be the shots that are being described by ESPN. So in the 2010/11 data the selected end point of 15 yards comes almost immediately after his furthest goal.

The shots marked in old gold originate further than 15 yards from goal and the first "orange" shot against Fulham drew a blank and didn't make it into the set that "demonstrates" Park's potency because it was taken a couple of inches outside of the chosen cut off point. By chopping the data end point at 15 yards some unproductive efforts are omitted that would have reduced his conversion rate and made his 2010/11 season appear less impressive in it's raw unregressed form.

This kind of slice and dice approach to data not only reduces sample size, it also leads to some very misleading conclusions. For example if I move Park's end point even closer to goal at the six yard line, his conversion rate becomes 0 from 1 or zero percent!

Park's Shot Record 2011/12 Sorted By Distance. (Red Shots <15 yards,Old Gold >15 yards).

Opponent.	Shot Outcome.	Probability of Shot Being On Target.	Probability of Shot Being Blocked.	Probability of a Goal.
Everton	On Target	0.49	0.15	0.22
Wigan	On Target	0.46	0.20	0.22
Wigan	Goal	0.46	0.20	0.21
Man City	Off Target	0.37	0.18	0.07
Arsenal	Off Target	0.40	0.22	0.12
Norwich	Blocked	0.32	0.19	0.03
Wigan	Off Target	0.40	0.25	0.16
Arsenal	Goal	0.34	0.22	0.06
Liverpool	Off Target	0.31	0.24	0.05
Tottenham	On Target	0.35	0.28	0.10
Blackburn	Blocked	0.25	0.30	0.03
Norwich	Blocked	0.28	0.32	0.05
Cumulative Probability		4.4 Compared to 5 Actual Shots on Target	2.8 Compared to 3 Actual Blocks.	1.3 Compared to 2 Actual Goals.

The ability to demonstrate whatever you chose by pre selecting a convenient end point is further highlighted when we look at Park's "drop off" season in 2011/12. If the same 15 yard line is chosen, happily for the preferred narrative the line falls a couple of feet in front of his goal against Arsenal,eliminating it from the 2011/12 sample set. Therefore, ESPN can report his 2011/12 conversion rate from 15 yards as 1 from 5 (20%), instead of the 2 from 6 (33%) if they'd gone the extra two and a bit feet requiring inclusion of his small part in United's 8-2 rout of the Gunners.

I've no problem with data like this being used as part of a "this is what happened" narrative. But small sample sizes such as were used here can tell us very little about a player's decline or improvement from one season to the next and once convenient end points start being drawn to exaggerate an already unconfirmed difference between one season's performance and the next...........well.

Pages