Friday 30 November 2012

Defending Your Corner. The Anatomy of a Stoke Goal.

Unlocking the secrets of a football match can be made slightly easier by concentrating on the important actions, such as shots and saves for individual players and set pieces, including free kicks and corners for teams. Set plays by their premeditated nature offer a relatively consistent level of defensive and attacking opportunity and by looking at the effectiveness of teams against a variety of different opponents we may be able to start to characterize what constitutes good set play defence and attack.

Manchester City demonstrated how important turning corners into goals is to a team's season. Generally, in the 15 Premiership goals which they scored in the immediate aftermath of a corner kick over the season as a whole and particularly, with the vitally important late season winner against title rivals, United and the 92nd minutes equaliser at the climax of the title race at home to QPR.

The champions also greatly improved their scoring rate from corner kicks compared to the previous year, perhaps indicating that there is profit to be had from identifying and implementing a successful set piece strategy.

One team which has long understood the importance of dealing with corner kicks at both ends of the field is Stoke City. Their opening goal against West Ham was clearly the product of work put in on the training field, with Gary Neville describing Walters' strike as "one of the goals of the season". Others preferred to highlight the perceived illegality which preceded the strike. Blocking opponents to provide clear space for a team mate isn't restricted to football, sending dummy runners in front of the intended ball receiver is now well establish in rugby, but physically preventing attackers from getting a clear run in football has even survived the introduction of extra officials in the European competitions. In short referees have set a line in the sand for acceptable behaviour at corner kicks which may not be fully reflected in other areas of the pitch and all teams exploit that leeway to the maximum of their ability.

Measuring corner kick success does require a certain degree of subjectivity. For example, six of Manchester City's set play goals came directly from first headed contact with the corner and three other from assists. Other scores came from extended play following a short corner or a ball deliberately played to a central area just outside the edge of the penalty box. Therefore it is sensible to divide corners into situations, such as short corners where possession is guaranteed and occasions where the ball is played directly into the box where the possession pay off is less, but if first contact is made with the cross, it is in a more threatening area.

The availability of x, y coordinates for match events has allowed us to chart match incidents, attempt to define success and quantify success rate for different areas of the pitch. In 2011/12 Stoke won over 150 corner kicks  in the EPL. As you'd expect for a side blessed with such an aerial threat very few kicks were played short or to the edge of the box. The average position for the targeted area was just north of the six yard line and around a foot west of the penalty spot in the direction of the near post.

In short, Stoke were aiming for the area in the box where a clear header is increasingly likely to produce a goal. As this post shows, the further you get from the goal, the less likely you are to score with a header compared to a shot, but inside the six yard box the situation is reversed an headers are an extremely potent option.

In this initial analysis, I've recorded all of Stoke's corner kicks and used a Stoke player making first contact with the cross as a broad definition of a successful kick. As a comparison I've included the "success" record for all the other EPL teams when launching corners into the Stoke box. We are looking at a combination of Stoke's corner taking ability matched to their opponents ability to defend the ball and vice versa.

I'm going to try to avoid overuse of figures in this post, but I will use them to verify my intuition that Stoke have worked very hard at defending and attacking corners over the last two completed seasons. The line of the six yard box appears to be the crucial area where header start to lose potency. Corners are also likely to see the six yard box more heavily defended than is the case in open play, so while conversion rates are likely very high deep into the six yard box, the likelihood that an attacking player will get the first touch quickly falls away to around 3% or less as he nears the goal line in the case of corners.

The starkest comparison between Stoke defending and attacking corners is in their ability to get the first touch to corners hit centrally to the six yard line. They have around a 1 in 5 chance of beating the defence to the ball, whilst restricting their opponents to barely half that success rate.

Stoke's Opening Goal from a Corner against Swansea.

Although the goal mouth action following a corner appears very haphazard, more of the initial movement and delivery is well rehearsed. Stoke's attacking and defensive ratios may be unsustainably high but as a team, they are undeniably well versed in creating and denying space at corners.

The above photo shows Stoke's opening goal at home to Swansea, this season. Glenn Whelan has already executed the first part of a successful corner by accurately hitting an inswinging corner at pace, centrally to the edge of the six yard box. The pace of the cross ensures that the Swansea keeper is reluctant to leave his line, but just to be sure, Steven N'Zonzi (1) takes just enough of his ground to make staying put appear a better option. N'Zonzi's height also enables him to hinder Vorm's ability to clearly see the action as it unfolds in front of him.

Huth (2) is just one of Stoke's incoming runners, Walters is off camera to the left and Huth's imposing size accounts for two Swans defenders. Shawcross has already ran corner side of the near post to draw another defender out of the six yard box. Despite his heading prowess, he has been used here as a decoy runner, or as an insurance should the corner be under hit.

The decisive action has occurred far post. Adam (3) is partly there to act as a blocker for target Crouch (4). Crouch has run his marker in towards the far post before peeling back into the middle of the box where the ball is going to be delivered. His marker has anticipated a far post run, but now finds himself blocked off by Adam, who merely has to stand his ground to give Crouch an unmarked free header, which he duly dispatches. N'Zonzi is ready to become active should the ball be blocked on the line.

It's great when a plan comes together as it did here and against West Ham as illustrated by Neville and each of the seven Stoke players had an important part to play in freeing Crouch for a clear header.

Stoke Defend a Swansea Corner....

....and another.

Defensively the Stoke players are intent on staying close to their opponents. Shawcross and Wilson in the first shot and Shawcross and Cameron in the second are engaging their marked opponents to such an extent that challenging for the  ball becomes almost secondary. Walters and Wilson in the second photo are also physically imposing themselves on the two most likely recipients of the Swansea cross. In short, physicality, denying strikers the freedom to make runs and a 6'5" keeper goes some way to explaining Stoke's ability to prevent their opponents from enjoying the same amount of successful first contact with corner kicks which they themselves enjoy.

These are idealized examples of Stoke City dealing with corners at both ends of the pitch, but the stats clearly indicate that they are likely to be above average in both abilities. Their opponents have acknowledged the futility of attacking the heart of Stoke's six yard box directly, by taking around 20% of corners short. Stoke, by contrast took less than 4% of their corners short last term.

Pulis has long relied on a mean defence to maximize his team's dearth of goal scoring and goalsoring from corner kicks regularly account for a significant proportion of their goals. Scoring rates of a goal a game aren't uncommon for Pulis led sides and this season is no exception and 15% of the strikes have come directly from corners.

The numbers can help to identify trends in team quality, but often game footage can then add the meat to the story. Stoke are adept at creating and denying space and the hard work, often on the edge of legality rarely appears explicitly in the currently recorded data.

Wednesday 21 November 2012

Poisson, Predictions and a Tense Last Ten Minutes.

Often the bigger the incentive is to get things right, the better the final results and the potential for you to lose money in an enterprise inflates that incentive .One "hidden" resource in the field of football analytics is provided by the gambling industry, who regularly produce sporting predictions that are occasionally skewed by weight of money, but more often provide a readily decipherable estimate of an event's true likelihood of occurring. Therefore, this post is written with a betting slant, but it is centrally applicable to the field of football analytics.

Everyone takes for granted the opportunity to bet on a sporting event "in running". However, it is worth remembering that it is a relatively new concept and as such the betting tools developed to describe such events are similarly under developed. Rewind a decade and once the first whistle was blown or the stalls opened, then the betting shutters were slammed tightly shut. Nowadays the betting carries on unabated.

Football is an obvious vehicle for in running wagers and that has created a need to predict match probabilities under many different combinations of scoreline and time elapsed. Aggregating many season's worth of historical data does a reasonable job of describing the general case, but is clearly lacking when applied to specific team matchups.

The major problem with these type of models is biased sampling. Poorer teams playing superior teams are more likely to find themselves trailing, say 2-0 after 45 minutes. So the sample used to predict the likely game outcome from this position will contain an over representation of poor sides and they will go on to perform in accordance with the wider gap in quality over the remainder of the game. In using this biased, general case to predict how Manchester City may perform should they trail 2-0 at halftime to the likes of Stoke will greatly underestimate the possibility of a Blue comeback. The chances of Manchester City storming back for a win in such circumstances is over twice that seen generally.

So if aggregated models have a major flaw too far, what are we to use ? The data revolution has enabled predictions to be made using vast amounts of different inputs, but this approach has produced a counter movement, where simplicity of design and input is thought to produce results of equal merit. A simple goal based model, using the outputs of a Poisson calculation on a team's average goal expectancy to calculate the probability of each side scoring an exact number of goals in a match has been well described in numerous websites since the late 90's.

This approach too has flaws, such as under prediction of draws and failure to account for a lack of independence between the expected scoring rates of both sides. However, these flaws are both well understood and because Poisson has long been used in football prediction, these problems have been extensively addressed.

I'll assume everyone has a passing knowledge of the Poisson approach to modelling football matches, but for the casual reader, the distribution allows an estimation of the likelihood of a team scoring exactly 0,1,2,3 goals and so on given we expect that team to score and average of say 1.6 goals in such a game. To fully appreciate how we can use the Poisson approach to begin to build an in running calculator we first need to grasp the concept of goal expectation.

When we say that a side has a goal expectation of 1.6 goals, we are saying that if today's game were to be repeated over and over again, the average number of goals we would expect our team to score would be 1.6. Sometimes they wouldn't score at all, sometimes they would score 6. The most likely outcome would be a score of exactly one, followed by two. But over a long period of repeats, the average would trend towards our best estimate of 1.6.

The most important thing we need to appreciate is how this goal expectancy decays over the 90+ minutes of a match. The average 1.6 goals per game figure decays because of time elapsed. Goals already scored or conceded may tweak the average slightly in one direction or another as a result of competing, scoreline dependent, tactical rearrangements, but a glut of early goals doesn't significantly alter our pregame goal expectancy.....only the passage of time can do that.

The rate at which a team's goal expectancy declines isn't constant. More goals are scored on average in the second half than the first as teams become more urgent in their efforts to score and fatigue leads to more space. The rates are around 44% for the former and 56% for the latter and the decay can be adequately described by an exponential equation of the following form.

Remaining Goal Expectancy = Initial Goal Expectancy x (Proportion of Time Remaining) ^0.84

Imagine Stoke is expected to score an average of 1 goal in a particular match, West Ham away on Monday night, perhaps. By halftime when the proportion of time remaining is very close to 0.5, the remaining goal expectancy can be calculated by inserting these values into the previous formula to give

Remaining Goal Expectancy = 1 x (0.5)^0.84 = 0.562 of a goal.

0.562 of a goal, you may notice equates to 56% of the initial goal expectation of 1 goal, which nicely fits the observed data. We can repeat this calculation for any minute of the match and also for the opposition. Armed with this information we are just a few repetitive, but simple steps away from being able to describe the likely scoring combinations that will occur in the remainder of the contest.

We'll fast forward to the 80th minute to use this accumulated knowledge to begin to construct a flexible and realistic "in running" prediction model. The West Ham/Stoke game was a fairly common type of Premiership contest, where two reasonable well matched sides were separated on the night by little more than home field advantage. An average expectation at kickoff for Stoke would be that they'd score close to one goal and concede just over 1.4 of a goal to the Hammers. If we insert those numbers into our equation and allow for the likely 4 minutes of added time we could expect Stoke to average 0.22 of a goal to West Ham's 0.30 in the remainder of the match.

If we now fire up the Poisson calculator we can produce probabilities that Stoke and WHU will score exactly 0,1,2,3 goals and so on, in the last 10+ minutes of Monday's game. Those probabilities are listed below.

The Likelihood of Stoke or WHU Scoring an Exact Number of Goals after the 79th Minute.

Team. 0 Goals. 1 Goal. 2 Goals. 3 Goals. 4 Goals.
WHU. 0.737 0.225 0.034 0.003 0.000
Stoke City. 0.806 0.174 0.019 0.001 0.000

We can now begin to accumulate the score combinations that will lead to a final match outcome, bearing in mind that O'Brien had equalised Walters' opening goal for the visitors and the match was currently stalemated. If, as actually happened, neither side scores, the match ends as a draw and the probability of a 0-0 is given by multiplying 0.737 by 0.806 or the individual probabilities of each side failing to score. That outcome has a probability of 0.594 or around 3 times in every 5. A 1-1 in the final "mini" match will also ultimately lead to a draw, as would a 2-2, 3-3 or 4-4 for the optimistic thrill seekers. If we finally total each of these individual, correct score probabilities, we have the likelihood of the currently tied game ending so at the final whistle.

A similar process generates cumulative probability totals for each correct score that leads to either a City win or a happy Hammers victory.

The Likelihood of Stoke or WHU Gaining any Result from 1-1 after the 79th Minute.

Team. Win. Draw. Loss.
WHU. 0.22 0.63 0.15
Stoke City. 0.15 0.63 0.22

The above example is conveniently simplified by a current scoreline of 1-1, but teams can both trail or lead as WHU and Stoke respectively did in this match. However, the process merely becomes slightly more tedious rather than more complex. Stoke's set piece prowess finally reached ground level in the 13th minute when Walters found space in front of decoy runners to crisply dispatch a precisely delivered Whelan corner, an inventive deviation from the Delap assists of old. So if we want to examine the likely match result from say the 34th minute we have to also account for the 1-0 lead held by Stoke.

In this game situation, should Stoke go on to "win" the mini match from the 34' onwards, they will bolster their lead and comfortably win the game. In addition, if they merely "draw" the remainder of the match they will also win the entire game because of the 1-0 lead given to them by their mustachioed striker. An actual draw requires WHU to "win" the next 60 minutes by a single goal or by two or more to claim all three points.

The Likelihood of Stoke or WHU Gaining any Result from 0-1 after the 33rd Minute.

Team. Win. Draw. Loss.
WHU. 0.16 0.25 0.59
Stoke City. 0.59 0.25 0.16

As the scoreline becomes more lopsided, the combinations that ultimately lead to wins, losses or draws also becomes more diverse. A team which holds a 2 goal cushion can afford to "lose" the remainder of the contest by a single goal and still claim victory. So the totting up procedure becomes more tiresome, although a spreadsheet helps greatly, but the Poisson process on a suitably decayed goal expectation remains constant.

As has already been stated, this quick run through does not account for well recognised deficiencies in using Poisson to describe football goal scoring, nor does it allow for the small, but real emphasis shifts that occur as the scoreline changes, but we can test the model's validity by comparing it's predictions to the efficient Betfair betting markets.

Below I've plotted the near 100% book prices that were available on Betfair in two minute intervals, along with the predictions from a pure Poisson during Monday night's Stoke West Ham match.

Price & Probability Movements During Stoke's 1-1 Draw At WHU.


0-1,Walters, 13'
1-1, O'Brien, 48'

The Betfair prices and the pure Poisson track each other's progress fairly accurately. The under prediction of the draw, inherent in the Poisson is well seen up until the WHU equaliser and my allegiance to one of the two side may also be represented throughout by my choice of initial goal expectations. Also the slightly increased optimism towards WHU during the half hour where they trailed isn't captured by the "blind" Poisson, but is by the Betfair traders and is also present in actual data.

Tuesday 20 November 2012

Mark Hughes' Perfect Storm.

It's no great secret that football managers generally part company from their teams when on field results fail to match a Chairman's expectations. In this guest post I look at the average performance recorded by sacked managers in the recent past and the level of success a team may reasonably expect to achieve based on their preseason investment.

To read the full post follow this link .

Monday 19 November 2012

Expected Points Graph for Fulham v Sunderland.

Fulham 1 Sunderland 3.

Fulham enjoyed a particularly favourable 2011/12 in terms of red cards. Dismissals aren't very common events, but they do have a considerable effect on individual matches and possibly across seasons. Fulham didn't see red once last term, but they found themselves playing against reduced numbers on four occasions and that imbalance had the potential to gain them around two extra league points over the course of the season.

When we are dealing with such small sample sizes, it's unlikely that Fulham would be able to sustain their impeccable behaviour in a league where the base rate for red cards is just over 3 per team per season. Teams can influence their disciplinary record, but there is also likely to be an amount of randomness associated with marginally mistimed tackles, so the base rate for cards is probably as influential as a side's individual recent record.

Fulham manager, Martin Jol believes that his skipper, Brede Hangeland was unlucky to be shown red against Sunderland, further reasoning the the dismissal was very costly to his side. Hangeland may have slipped just prior to his two footed tackle, so Jol may have a point. He's certainly correct when he bemoans the cost of the card which came just 31 minutes into the match and was the equivalent of his full strength side conceding a goal. Stoke fans will also be disappointed as the Hangeland will now miss Fulham's trip to the Britannia on Saturday and they will be deprived of the opportunity to watch The Cottagers' influential skipper in the flesh.


31', Hangeland, Red Card.
51', Fletcher, 0-1
62', Petric, 1-1
65', Cuellar, 1-2
71', Sessegnon, 1-3

Sunday 18 November 2012

Injury Time, Substitutions & Goal Celebrations.

Injury time, stoppage time or " the fourth official has indicated there will be a minimum of four minutes time allowed" as it is usually described by the stadium announcer at The Britannia is one of the most contentious decisions made by the referee on match day. If your side is trailing, five minutes is never enough to compensate for the interminable time spent by the opposition while taking throws or goal kicks, while three minutes is a fantasy of an official's biased mindset if you are holding a tenuous one goal lead.

The Laws of the Game hand all of the cards to the referee in the stoppage time debate. Time can be added for time lost due to substitutions, assessment and removal of injured players, time wasting and any other cause. Just in case final point didn't adequately convey the free hand afforded the referee in deciding the amount of stoppage time, the law ends by informing it's readers that "time lost is at the discretion of the referee".

Substitutions are the most readily available data points and the least open to individual refereeing interpretation. Nowhere in the Laws of the Game is a specific amount of allowed time mentioned to compensate for a substitution, but 30 seconds has almost universally become an accepted figure. Similarly, the time added, post goal scoring, to allow the players time to perform a choreographed celebration has also semi officially been set at 30 seconds and presumably falls, along with streakers and escaped dogs under the final catch all reason of "any other cause".

We are therefore left with injury assessment and removal, and time wasting as two relatively unknowable causes of stoppage time. The first is difficult to record and is at the discretion of the official and the second is almost entirely at the referee's discretion and also appears to be score and team dependent. Has a player ever been booked for time wasting, late in game when his team is trailing in the match regardless of the time he dallies over a free kick ?

Creating models that describe footballing events where some important factors are missing isn't unusual, often perfectly useful constructions can be produced using limited inputs. So, using data from the MCFC data dump I've tried to predict the average amount of injury time a team could expect to experience during the  2011/12 season using only the number of goals and the number of substitutions that occurred in total during their matches.

Use of substitutes varied quite markedly across different teams in 2011/12, Newcastle, Norwich, Everton and Manchester City were among the sides which made full use of their replacements during the season, while Fulham and Blackburn were much more reluctant to ring the changes. Each of these sides featured at the extremes for total match substitutions,with higher overall numbers also being seen in games featuring Arsenal and Wigan and lower numbers involving Liverpool and Sunderland matches.

The amount of goal scoring is easier to categorize. Teams at either end of the table generally experience contests with more goals, partly as a result of the top sides recording wide margin wins against the strugglers. Mid table outfits, on average see less goal laden matches. During the present century games played by the top six have seen an average of 103 goals a season, compared to 102 for the bottom six and just 96 for the eight teams in between.

The correlation between average stoppage time and seasonal total match goals and total match substitutions is significant. Above I've plotted the average stoppage time per game against the predicted time derived from a regression for all other teams that uses the two inputs of goals and substitutions. The majority of teams congregate around the line of best fit, but an obvious outlier is Blackburn. They averaged just under 360 seconds of total injury time per game, despite their predicted total from the amount of goals and substitutions in their matches implying that they should have received considerably less.

The reason is fairly easy to spot. Junior Hoillet was stretchered off following an injury time clash of heads with Fulham's Mark Schwarzer in their 1-1 draw leading to 11 minutes of second half injury time compared a budgeted for three. Those "extra" eight minutes spent assessing and dealing with an injury contribute 13 seconds per game that aren't accounted for in the model. Once we start to account for another obvious contributing factor such as disproportionately large amounts of time lost due to treating injuries, the line of best fit moves more in line with reality and Blackburn cease to be such an extreme outlier.

Unsurprisingly, over a season, the number of game goals, substitutions and injury stoppages appear to be good predictors of stoppage time awarded to Premiership sides.

Too little, too much ?
Of more interest are the predictions spat out by the regression as the number of goals increase. Increased numbers of substitutions lead to more stoppage time, but increased goals over the season actually decreases the amount of stoppage time awarded with substitutions held constant. The data used in the regression has been aggregated over a season, so this trend may not carry over to future seasons or down to individual matches, but it's tempting to speculate that referees are implementing their version of the "mercy rule".

Games with lots of goals are more likely to be one-sided compared to games with fewer scores. Adding less stoppage time than is merited when one team reaches the 90th minute with a three or more goal advantage will hardly ever change the ultimate result. As a ballpark example, an underdog trailing 3-0 to the likes of Manchester United will take something from such a game under 1 time in 100,000 if offered 6 minutes of injury time, 1 in 38,000 if we extend it to 10 minutes. So a perhaps unconscious effort is being made by referees to put already beaten teams, prematurely out of their misery, knowing that the trailing side is almost certainly beyond hope.

The second half of United's 8-2 trouncing of Arsenal saw 6 goals, one dismissal and 5 substitutions, but just three minutes of stoppage time, although in reality a further 23 extra seconds were actually played. Similarly, Wigan 0 Arsenal 4 contained 2 second half goals, a full pack of substitutions and just 2 minutes of advertised stoppage time.

If the best and the worst sides are seeing games prematurely ended, this may explain why both Manchester clubs, along with relegated Bolton, finished in the bottom three for allotted stoppage time in 2011/12, although the often repeated conspiracy theory that the best get more time only when they need it will require more individual game data, hopefully in future posts.

Thursday 15 November 2012

How Do Red Cards Affect A Football Match?

A look at last season's red card winners and losers. It's not just about how many cards you take, it's also about when you take them and if you can induce your opponents to see red. A sending off costs a team points in the longrun, so ill discipline loses your team prize money and can ultimately may even threaten their Premiership position.

Follow this link for my guest post.

Tuesday 13 November 2012

Super Subs and Selective Cutoff Points.

Not content with calling itself home to one super sub, Manchester inducted a second on Saturday night, when Javier Hernandez, also known as Chicharito accounted for two and a half of the goals that saw off Aston Villa in a night of blood letting for footballing cliches.

First to go, depending upon you view point was the invincibility or vulnerability of the two goal lead. That was quickly followed by the ability (or not) of the best to come up with needed late goals at will and in doing so Chicharito's 87th minute winner also cemented his lot as another super sub. An early rebuttal of the 90's claim that "you'll win nothing with kids" was hastily shelved following Villa's tame second half demise.

Chicharito was inevitably hailed as a super sub throughout the press. The Daily Mail, The Express, The Independent each dedicated articles to his super sub status and The BleacherReport had to fall back on "uncategorizable" as the defining quality of such a player.

Chicharito. A Talented Striker.
I argued here that super subs arise through a combination of recency bias, small sample size and a failure to account for the richer overall goalscoring environment in which substitutes inevitably play. But there is another selection bias that is present in the statistics that accompany these articles that almost guarantee that such a player's record as a substitute will appear to be much better than his record as a starter.

Gambling touts make extensive use of the technique of using selective cutoff points when describing their profit (or loss) record, invariably starting and/or ending their "fully verified" record with a string of winners. And model builders can also unwittingly fall foul through insufficient out of sample testing of their new toy, leading to a randomly discovered favourable cutoff point in the original sample becoming the precursor to imminent, real time failure.

In sport, the selective cutoff point is a "tool" often used to support a preconceived notion about the relative merits of two players. At it's basest you select a period of time during which Player A excelled at the metric of your choice and Player B didn't and then use this biased comparison to demonstrate that A is the superior athlete.

In this post I showed how selectively restricting the the goal scoring record of Park Ji-Sung to shots from certain distances artificially inflated the apparent difference in his shot conversion across two different seasons. Selectively setting his scoring record to include only goals scored from within 15 yards of the target included all of his goals in his "up" season, but just some of his strikes in his "down" year.

A similar thing is happening when players are being designated as super subs, although the process is almost certainly being unconsciously applied. Very few expensively bought attacking members of a top EPL team's 25 man squad wake up on a Monday morning to find they have become super subs after a barren weekend from the bench. If they've bagged two out of his teams three goals, as Chicharito did as a starter for United against Braga as recently as last month, he's even less likely to be dubbed as the new David Fairclough.

But score two out of three goals from the bench, as Chicharito did on Saturday evening or two out of two as Dzeko did against WBA at the Hawthorns and the narrative is already written....and just as importantly a biased cutoff point has been set that will guarantee an inflated strike rate from the bench to back up the story.

Understandably and apparently diligently, the super sub's club career statistics are then used. However, in combination with all the other flaws, a biased cutoff point, immediately following a game where the player has performed an outstanding and atypical example of the identifying trait, has been applied to seal the "proof".

The drip, drip of unreliable numbers gradually cements the myth.

Monday 12 November 2012

The Five Best Passes During Week 4 of the Europa League.

Passes are the life blood of a football match. The vast majority of them are completed and serve to either maintain possession before a more ambitious assault is unleashed on the opposing goal or to protect a position of strength and prevent the opposition from launching an attack of their own. But it is the spectacular and ambitiously successful passes that grab most of the headlines, especially if they result in a goal scoring  chance. Often the player who provides the chance shares with the scorer a substantial amount of the credit for a goal.

The amount of difficulty involved in completing a pass is dependent on many variables. Defensive pressure on both the passer and the intended target will partly determine whether or not a pass can be easily completed, as will the area of the pitch from where the pass originates and the intended destination. Players can be expected to complete the majority of passes they attempt deep in their own half, partly through lack of intensive opponent pressure and through prudent choices of intended recipients.

The longer the distance of a pass and the deeper into opposition territory and more central the target, then the more difficult it is to complete the pass. By pooling large numbers of passes from different areas of the pitch and recording success rates for these passes, we can begin to develop a model to predict how likely an average passer would be to complete a particular, individual pass.

Passes are therefore neither created equally, either in their difficulty of execution or in their influence on the outcome of a match. However, in this season's UEFA Europa League a simple five yard pass between defenders as they run time off the clock is as valuable as a 40 yard point pointed through ball to open the scoring. Please visit to read about the initiative to provide a day of schooling for young people, worldwide.

The campaign is supported by Western Union and world footballing legend Patrick Vieira.

To support the program, I've used my passing model described here to quantify the five most difficult passes that led directly to goals during week four of the Europa League. As is traditional I'll list the five in reverse order.

Number 5, Diego Capel, Sporting Lisbon (vs Genk).

Capel's assist was typical of the kind of low percentage pass that can bring high rewards. The penalty area is quite naturally well defended and to create an inviting chance often requires pin point accuracy to drop the ball close enough to the six yard box to increase the chances of a goal, but not close enough to invite the keeper to claim an easy catch. Breaking down the right wing, he chose to cut back onto his left foot to provide an inswinging far post cross for van Wolfswinkle. That gave him more margin for error in finding his striker, but required his team mate to put most of the power onto the header. An excellent fast counter attacking goal from a team who were down to ten men at the time.

Number 4. Szabolcs Huszti. Hannover (vs Helsingborgs). 

A constant feature of difficult passes is that they are aimed into the penalty area from distance and Huszti's  lofted outswinging delivery was a perfect example of the art. The aerial route reduces the number of potential defensive interventions, but increases the amount of time defenders and keepers have to converge on the ball....unless the ball is hit with pace. The harder the ball is hit, the less accurate it becomes, but Huszti executed direction and pace to perfection.

Number 3. Fininho. Metalist Kharkiv (vs Rosenborg) .

Fininho started this goal build up with a neat nutmeg out on the left wing. This was the longest crossfield ball so far, but it was hit wingwards and towards the right hand edge of the box, an area that was likely to be less populated by defenders than the heart of the penalty box. It was partly an attempted assist and partly a ball designed to change the point of the attack. Taison had stayed wide to accept the pass and the defense drifted out to meet him. However, instead of controlling the ball, he smashed an unstoppable shot high into the net from an narrowing angle. Not quite the most difficult pass on show on Thursday, but by some way the most unlikely goal.

Number 2. Gareth Bale. Tottenham Hotspur (vs Maribor).

Another excellent left footed delivery from the flanks. Bale took advantage of a momentary stumble by the Maribor defender. But he still had to curl the ball around his desperate, attempted recovery and find Defoe's feet, central to the goal and at the edge of the six yard box with great accuracy. The pace of the ball also meant that Defoe merely had to steer the ball into his choice of corners with the keeper tempted, but powerless to intervene.

The Top Five Passes from Europa League, Week Four.

Team. Minute. Scorer. Passer. Chance Of Pass Being Completed. Chance Of Pass Being Converted.
Club Brugge. 14 Trickovski. Donk. 22% 16%
Tottenham. 22 Defoe. Bale. 35% 18%
Metalist Kharkiv. 4 Taison. Fininho. 35% 2%
Hannover. 3 Diouf. Huszti. 38% 21%
Sporting Lisbon. 64  Wolfswinkle. Capel. 39% 11%

Drum roll....

Number 1. Ryan Donk. Club Brugge (vs Newcastle).

The wide margin winner for the pass of the round. Pass completion is made easier if the recipient can create a passing angle for the passer, either through movement or by inviting the pass to be made into space. Diagonal running makes for an easier pass and conversely passing when the passer, defender and striker are almost perfectly in line significantly increases the tariff.

Donk had 10 Newcastle players in front of him, Trickovski deep and a defender directly in line when he attempted a pass resembling a desperation Hail Mary from the NFL. His margin for error was tiny. Under hit the ball and a defensive clearance was an almost certainty, over hit the pass and the keeper/sweeper came into play.

The ball had to land perfectly in stride to be collected by Trickovski's vertical run, but the execution was precise and the rewards were large when the striker arrived at the edge of the box, in a central position with the Newcastle defence all behind him, save for an exposed keeper, who he duly beat.

An outstanding pass, a fine finish and a worthy winner.

Friday 9 November 2012

Edin Dzeko Is Not a Super Sub.

Super sub is an uneasy and often unwelcome crown to wear, with the juxtaposition of a superlative with a faint hint of failure. Liverpool's David Fairclough provided the benchmark by which all other lethal replacements are measured, making his name in the late 70's and early 80's during an era when domestic substitutes were singular and always wore the number twelve shirt. 37 goals from 92 starts and 18 from 62 from the bench hint at the lopsided nature of his goalscoring exploits that ensure he is fondly remember by football fans from the seventies and not just on Merseyside. Ultimately his misfortune was to straddle the careers of first Toshack and Keegan and then Dalglish and Rush.

The attractive narrative of the super sub is easy to appreciate. Dramatic match winning strikes live long in the memory and most substitutes manage to stay on the pitch to feature in many such goals. Ole Gunnar Solskjaer's 93rd minute winner against Bayern Munich in the Nou camp in 1999, twelve minutes after his introduction. Fairclough's reputation making winner in the 84th minute against St Etienne in an earlier incarnation of the same competition and Moses' late header a few days ago are just a few such efforts that understandably eclipse the much more numerous failed substitutions. "Not so super sub" doesn't quite have the same headline appeal.

Bosnian, Edin Dzeko is the current poster boy for the super sub with an eye catching recent run of scoring form from the bench, combining quantity with game changing late strikes. You can read a typical appreciation of Dzeko's supposed qualities on the BT Footballing Website written by journalist Rob Smyth.

Helpfully the statistics appear to back up the narrative that Dzeko is much more effective as a late, game changing introduction rather than as a starter. He's made over 40 starts for Manchester City since his arrival, comprising over 3,700 playing minutes during which he's scored 19 times at a rate of a goal every 195 minutes. Contrast this with his forays from the bench when his 30+ substitute appearances, lasting just over 600 minutes has yielded 12 goals at a rate of a goal every 52 minutes.

Undeniable evidence ?

Unfortunately, the evidence fails on two crucial counts. Firstly, "super sub Dzeko" is playing in a very different goals scoring environment than is "starting Dzeko". Scoring in football becomes almost imperceptibly more frequent as a match progresses, with 45% of the goals coming before the interval and 55% after half time. We can demonstrate these different scoring environments by looking at a typical breakdown of goals for the first ten minutes of an EPL game (when "starting Dzeko" will almost certainly be on the field) and the final ten minutes (when "super sub Dzeko" will be present).

The first ten minutes is more likely to be goalless than the last ten and the former is also significantly less likely to contain exactly one or two goals than is the final ten minutes of a game. As a substitute, Dzeko averaged 18 minutes per game, so he was consistently playing when goal scoring was approaching a peak, not just for him but also for the team as a whole. Therefore, to look at his scoring exploits from the bench within the context of his scoring environment we need to compare his scoring record with that of Manchester City as a team for the time he was on the pitch as a substitute. And also repeat the exercise for Dzeko as a starter.

Goals By Dzeko As A Starter. Total City Goals Over Same Timescale. Goals By Dzeko As A Sub. Total City Goals Over Same Timescale.
19 74 12 32
26% of Total. 37% of Total.

Initially the evidence still appears strong, although not quite as extreme as the figures based on raw minutes per goal in Dzeko's role as a starter and as a sub . As a substitute Dzeko scores 37% of the goals that City score while he is present in that role, but just 26% when he starts. However, we are now faced with a second problem if we try to take these figures at face value. Dzeko's super sub status is based on just 600 minutes of playing time, a fifth the size we've used to measure his scoring rate as a proportion of City's overall record as a starter. And small samples often lead to extreme, but unreliable estimates.

Dzeko's 600+ minutes as a sub is almost exactly equivalent to the number of minutes he played as a starter in his first eight games of the 2011/12 season. During those matches he scored 7 of the 17 goals recorded by City, 41% of their goals. A number in excess of his super sub strike rate, but a poor indicator of his career figure of 26% based on a larger sample size. If his hot start to 2011/12 wasn't indicative of his career figures, shouldn't his 37% strike rate as a sub, collected under very similar playing time also carry a caveat ? 

Dzeko obviously doesn't relish the tag of "Super Sub", but the good news is that sample size and differing goal environments indicate that he probably isn't one anyway (even if they exist). In fact for a more extreme example of the "art", he need merely look across his own attacking line to Sergio Aguero, who has scored over 50% of his team's goals as a substitute on an even smaller sample size of 300 minutes. 

In short, City have an embarrassment of attacking riches and if you cut the sample small enough extreme results are bound appear. Add a persuasive narrative and you've recreated a long lost story from the seventies. 

Thursday 8 November 2012

Shot Analysis Of Manchester City 2 Ajax 2.

48 hours after Manchester City failed to beat Ajax in the second match of their Dutch double header, Champions League fixture and the inquest into City's "failure" continues unabated. However, as in politics, football can occasionally predict what they wish for rather than what they suspect may happen.

The more prolonged and league based the contest, the more likely it becomes that the best teams will make it through to the very latter stages of the competition and the hybrid league/knockout format of the UCL goes part way to assisting the progress of the better sides in Europe's premier competition. However, it is the initial group seeding process that most helps the giants of European club football and as Simon Gleave brilliantly demonstrates in his latest tweets from the Scoreboard Journalism blog, City have been handed an extremely tough task as fledgling European campaigners.

The UK betting industry may have been immune to the "Mitt" factor, calling Barack as a 80%+ favourite before the polls closed, but they have undoubtedly included a small patriotic premium in their view of City's chances in a far from straightforward Group D. Leading to an inflated overall expectation for the 2011/12 English champions.

Entwining the relative merits of different leagues from different countries has become easier as the scope of European club competition has rapidly expanded. Interlocking formlines between the likes of Chelsea and Barcelona can be readily extended to the lesser lights of the Premiership, who can only dream of entertaining the Catalan giants on a wet Wednesday night somewhere in the Midlands, but regularly compete against Chelsea. It is therefore a small step to equate the talent levels of Ajax with those of Everton or at a push an under performing Arsenal.


11', de Jong, 0-1
17', de Jong, 0-2
22', Ya Ya Toure, 1-2
74', Aguero, 2-2.

Notwithstanding the high, if unrealistic hopes for City in the competition, they were quite rightly favoured to beat Ajax on Tuesday night at the Etihad. However, they suffered a dramatic reversal of roles as two close range de Jong strikes, one with his foot and one with his head gave the visitors similar expected points levels to those enjoyed by their hosts at kick off. A Toure goal bought the teams to within touching distance almost immediately and then the clock ticked in Ajax's favour as each team enjoyed bouts of possession dominance.

Aguero levelled the match with around 20 minutes of playing time remaining, but a stalemate was now much more likely than it had been at the start and despite the understandable exposure given to Baloteli's dramatic fall in the box after 93+ minutes, it's possible that the game was even over when the "foul" was committed. Certainly a team which relies on a last kick penalty for a win hasn't made full use of the previous 94 minutes.  

The Likelihood of Shots from the Manchester City/Ajax UCL Game Resulting in A Goal.

Player. Minute. Goal Probability. Outcome.
Eriksen. 3 0.03 Blocked.
Aguero. 5 0.05 Off Target.
Zabaleta. 7 0.12 Off Target.
de Jong. 10 0.25 GOAL.
Nastasic. 11 0.09 On Target.
de Jong. 17 0.21 GOAL.
Ya Ya Toure. 22 0.15 GOAL.
Zabaleta. 30 0.11 On Target.
de Jong. 33 0.01 Off Target.
Boerrigter. 37 0.02 Off Target.
Ya Ya Toure. 39 0.13 On Target.
Nastasic. 50 0.11 Off Target.
Ya Ya Toure. 53 0.03 Blocked.
de Jong. 56 0.02 On Target.
Eriksen. 62 0.04 Off target.
Barry. 65 0.04 Off Target.
de Jong. 71 0.04 On Target.
Aguero. 73 0.06 GOAL.
Baloteli. 79 0.13 On Target.
Dzeko. 79 0.05 Off Target.
de Jong. 84 0.06 Off Target.
Kompany. 86 0.07 Off Target.
Eriksen. 91 0.04 On Target.
Fischer. 92 0.07 Off Target.
Manchester City Cumulative Expected Goals. 1.1
Ajax. Cumulative Expected Goals. 0.8

Looked at through shot statistics, a draw may have been a fair outcome. Each side managed about a dozen attempts, but many were from distance and as such were likely to be successful around one time in 20.

De Jong was the star performer for Ajax, scoring twice and also topping the number of attempts for either side. He got on the end of the two clearest chances of the night and converted both of them, although he also benefited from a reluctance from the City defence to mark him at set pieces or close him down from distance. The projected success rates for shots accounts for pitch position, but at the moment averages the likely defensive pressure. So, and as visual evidence confirms, de Jong's chances were slightly easier than the generic probabilities imply.

De Jong's first goal came courtesy of a deflected attempt from Moisander. The defender's header was itself a solid chance from a corner and can be recorded as either an assist or an opportunity, but it does highlight the particular problems City had when defending corners early in Tuesday's game. Although to draw permanent defensive traits from such little evidence is perhaps premature.

His second strike also demonstrates the difficulty of defending headers from inside the six yard box as discussed here. City certainly erred by losing de Jong in the box, but it's unlikely that a defender on the post would have had any more joy at preventing the goal than Joe Hart had armed with the advantages of being the keeper. As a further aside, corner takers who "fail to beat the first man" are actually trying to hit the area of maximum reward, an area so ruthlessly exploited by de Jong. So criticism of their failure should be tempered by knowledge of their intentions.

Ya Ya Toure pulled a goal back with the game's third most likely opportunity, so a night of fluctuating fortunes came about because chances that were most likely to produce a goal actually did produce goals.

Aguero's equaliser again demonstrates the need to begin to assess the impact of defenders on shot probability. He had all of Ajax's defenders behind him when he took his shot from just inside the box, compared to the more likely scenario where such attempts have to navigate their way through a crowded penalty box. He also profited from Vermeer adopting the occasional Jan Jongbloed approach to shot stopping. In short Aguero's chance almost certainly had a higher likelihood of success than the average effort from that distance.

Cumulatively the reasonably large number of chances created by each side is diluted by the large number of attempts being from distance. City, as you would expect given that they were both at home and trailing for large portions of the match accumulated a higher goals expectation, but Ajax almost matched them through higher chance quality and an effective counter attacking tactic, especially later on in the game.

If this game was mindlessly played out time and time again on a spreadsheet, with no regard for current score and using the shooting data as the basis for your model, City would win 42% of the reenactments, Ajax 26% and 32 % of the matches would end in the same outcome as Tuesday night's actual reality.

Saturday 3 November 2012

A Predictive Pythagorean For Football.

If baseball is the sport to which all other flavours of analytics can trace back their origins, then the most widely recognisable product of the sabermetrics movement is the Pythagorean Expectation. Elegantly simple and possessing a format that is instantly recalled by anyone who has taken maths at even the most rudimentary of levels, Bill James' equation allows a team's season long achievements to be seen with some of the unrepeatable, luck driven outcomes removed from the table.

In it's rawest form a baseball team's runs scored and allowed record is examined and mindful of the part that luck may play in scoring or conceding over various timescales, an expected win/loss record is produced that may differ from reality especially in cases where a team has managed to record an unsustainably large number of narrow victories. The aim is to produce an expected record that may be more indicative of a team's true ability level, and therefore future expected performances, than their actual record that may by partly a product of good or bad fortune.

As with many such developments the initial insight was immensely valuable and through refinement and input from the wider sabermetrics community Pythagorean expectation has become an extremely useful tool in the evaluation of team ability.

Other sports inevitably developed their own version of James' contribution and where basketball and American Football led, Association Football eventually followed. James initially suggested 2 as his exponent of choice (hence the Pythagorean name), but just has his initial attempt has undergone much change, a straight conversion from the diamond to the football pitch wasn't really possible. The most obvious problem is that draws aren't a feature of baseball, but football positively revels in producing them. So an approach based around success rate, where draws account for half a win was required in football.

Other potential problems for the formula existed in both sports. Scoring environment is a subtle, but important factor in producing results in both baseball and football. A football team which plays in contests where a below average number of goals are scored compared to the league average will see more draws than a team which plays with a more expansive approach. So inclusion of the goal environment for individual sides somewhere in the exponent is desirable for any sport choosing to develop a Pythagorean approach.

Finally, aspects of a particular sport that are well understood, but transient, may alter scoring or conceding rates in one season, but may be absent or substantially different in subsequent ones. Unusually large numbers of red cards, for example may result in a team's seasonal goal scoring records being correctly interpreted by a Pythag approach, but the team may improve next season through nothing more than better behaviour or a more evenly distribution of fouls.

Using Pythag in football is perfectly possible. It's really duplicating the goal difference approach where teams who have inflated points totals compared to those typically expected for a similar level of goal difference are labelled as over achievers who have probably got lucky in a large number of close games and will fall to earth sooner rather than later.

As a tool it also appears to make intuitive sense by downgrading those who appear to benefit by winning more than their fare share of close matches, while inflating the prospects of teams who appear capable of better if they had received a little more luck in how their scoring and conceding sequencing occurred. That alone, however, merely gives us an alternative opinion regarding the quality of teams. Our next step is to see if these opinions help to give an improved view of the future compared to some other measurement.

The aim of the Pythagorean expectation  for a team is to reduce the effect of non reproduceable, luck driven events and other sports routinely use a team's Pythag from one season to better predict their actual record in an upcoming campaign. Therefore to test the value to football I calculated the Pythag using an exponent that incorporates goal environment for every EPL team during the continuous "38 games era", comprising  the last 17 completed seasons. I then plotted these figures against the number of points gained by the teams in the EPL during the following season. As a comparison I repeated the process, but used just actual points gained in both seasons.

In total 340 teams recorded a Pythagorean expectation in the EPL and survived to compete in the division in the next season, so there is a survivor bias in the sample, although movement is less pronounced than in lower leagues where the best are also removed from the sample in subsequent seasons. 166 teams had Pythagorean points expectations that were in excess of their actual total for that season and 174 teams overperformed in reality. The average over or under performance in each case was around 3.5 points.

The plot using actual points in both seasons was very similar to the Pythag plot (shown above), but Pythag had a stronger correlation, with respective r^2 of 0.64 and 0.57 suggesting that "luck" corrected points totals are the more valuable predictor of future performance levels than mere raw results.

Individual cases are always interesting, if rarely indicative of a trend as a whole and the "unluckiest" team in the sample was Manchester City in 2003/04. They amassed just 41 points in finishing 16th against a Pythagorean expectation of 53 and in 2004/05 they lived up to their Pythag expectation from the previous year by gained 52 points in finishing 8th.

Neighbours, United were the sample's highest flying overachievers in 1999/00 when they amassed 91 actual points against a goals for and against derived expectation of just 80 and like City in the next season they gravitated towards their previous year's "deserved" points total by gathering exactly 80 points in retaining their title.

Team Performance Against Pythagorean Expectation Over The Last 17 EPL Seasons.

Team. Number Of  Over Performing Seasons. Number Of Under Performing Seasons.
Arsenal. 9 8
Aston Villa. 5 12
Birmingham. 4 3
Blackburn. 7 8
Bolton. 6 7
Charlton. 5 3
Chelsea. 8 9
Derby. 4 3
Everton. 4 13
Fulham. 4 7
Leeds. 6 3
Leicester. 3 4
Liverpool. 3 14
Manchester City. 6 6
Manchester United. 13 4
Middlesbrough. 5 8
Newcastle. 7 9
Stoke. 3 1
Sunderland. 6 5
Spurs. 9 8
WBA. 1 5
WHU. 10 4
Wigan. 6 1
Wolves. 3 1

The temptation is great to use Pythagorean expectation to try to identify persistently over or under achieving sides and to try to identify factors such as particularly astute managers or well balanced teams as crucial factors in producing more points for your scoring record. The narrative is appealing and baseball particularly has gone done this route. So for those interested I've tabulated the number of over and under performing seasons recorded by teams with four or more EPL seasons to their name since the league went exclusively to 20 teams.

It's also tempting to be immediately struck by Manchester United's over populated positive column. However, the majority of sides are within touching distance of parity between good and bad and many such as Wigan have achieved a lopsided record under multiple managers. Also survivor bias exists within the sample. Many "good/lucky" managers may have dropped of the radar if their tenure began with a "bad/unlucky" initial run of seasons. A fate that, anecdotally very nearly befell Sir Alec had Mark Robins not grabbed a winning FA Cup goal against Forest well before the start of the Premiership adventure. Overall evidence of a persistent ability to cheat your goal scoring records is fairly weak.

Pythagorean estimates are a useful tools, especially as a means to include goal environments into the equation and their versatility can be extended to a match by match seasonal analysis, but genuine and persistent over or under achievers are likely to be a very rare beast. Team's which are out performing their goal scoring and conceding records are probably just enjoying a slice of good fortune.

Thursday 1 November 2012

Away Goals In The Uefa Champions League.

Ever wondered how costly it is when a home side allows the visiting team to grab an away goal in the first leg of a UCL knockout tie ? Then check out my guest post at by following this link  .

Lots of great posts on a variety of sports and a multitude of subjects are added daily, written by some of the best bloggers and writers on the net. Check the site out.