Thursday 28 February 2013

Robin van Persie. Shot Conversion at Arsenal and Manchester United.

Van Persie's transfer from Arsenal to Manchester United, arguably at the peak of his goal scoring talents, enables us to look at his conversion rates at both clubs and to speculate on the possible reasons for any differences in the two figures. In this guest post I compare his cumulative goal expectation for every shooting attempt at each club over the last two seasons.

Tuesday 26 February 2013

Using the MCFC Data to Define Successful Playing Style.

The initiative last year by Manchester City to release a season's worth of game based stats in addition to a granular play by play break down of their Premiership match at home to Bolton from the 2011/12 season has provided a rich vein of data for the football analytics community to work with. Of the two, the former is easier to work with because it was released already in excel format, it was sorted by player and contains details on nearly 200 individual in match actions. Once these player actions are sorted by game and summed it is possible to produce an extensive record of the events that took place in every Premiership game from last season. Websites such as Joe B's football data had previously extended the data available from mere goals to include shots and cards and the City release has raised the bar considerably.

The second release, comprising every event from a single match appeared in xml format which required a degree of expertise to re configure, but yielded exceptional information, including x,y co-ordinates of each event, time stamps, enabling such things as passing sequences to be easily recorded. In short, the latter release is considerably more detailed, but restricted to a single game and the former has more general averaged information, but it's net is cast over an entire season.

The choice of City and Bolton was well made, as it highlighted the almost polar opposite approaches currently seen in top flight football. City's method relies on passing and possession, compared to a Bolton side which typically played with much shorter passing chains. The release of such comprehensive data is understandably a one off event, but we can use the initial, more general release to create an approximation of the more detailed release to cover such events as passing sequences.

The general release contains a column for each side's total number of passes and also for events which typically end a passing sequence, such as tackles and fouls, or on a more positive note, goals and shots at goal. By dividing the former by the total number of sequence ending events we can obtain a figure which should give an indication of the sides which enjoyed longer passing chains both over the season and in each individual match of the 2011/12 campaign.

Average Number of Passes per Passing Sequence Sorted from Longest To Shortest. 2011/12.

Rank. Team.
1 Swansea.
2 Manchester City.
3 Manchester United.
4 Arsenal.
5 Spurs.
6 Chelsea.
7 Fulham.
8 Liverpool.
9 Wigan.
10 Norwich.
11 Everton.
12 Wolves.
13 WBA.
14 Newcastle.
15 Sunderland.
16 Aston Villa.
17 Bolton.
18 QPR.
19 Blackburn.
20 Stoke City.

Conventional wisdom is confirmed. Swansea pass the ball a lot, while Stoke are a long ball side, where sequences end very quickly. The top four also occupy places in the top six, confirming their ability and desire to complete long passing sequences. What we are seeing is a representation of how each team predominately played their football last season, with passing teams at the head of the table and those teams which were less able or not tactically required to retain the ball at the bottom.

Using the MCFC data I have a hopefully accurate approximation of the average length of passing chains made by each side in all 380 matches from last season. If we ignore game states for a later post, there is a strong positive correlation between the number of consecutive passes made by Arsenal and the likelihood that they won the match. Arsenal probably led through the use of sustained passing sequences and then, particularly against weaker opponents, kept that lead by similar ball control tactics. Similar significant, individual match correlations hold for Manchester United, Chelsea and Spurs, but interestingly not for City, in a season where set play goals played a notable part in them lifting the title.

Another omission is table topping Swansea, there was no significant correlation between increased passing possession and an increased likelihood that they won a particular game. However, there is a significant connection between increased passing sequences and Swansea not losing a game.

United, Chelsea, Spurs and Arsenal were the teams who possibly used extensive passing as their primary match winning approach. City were less clear cut, relying also on set pieces as a source of goals, a route that may have dried up this term. Swansea's style, by contrast appears be used primarily as a defensive tactic, as they attempt to keep possession in less threatening areas of the pitch. They used passing in the EPL in 2011/12 as a means to not lose.

At the "bottom" of the passing table, Bolton are the only side with a significant correlation between their passing tendency and match results. The shorter their passing sequences, the more likely it was they won the match. It is tempting to think that a preferred route one approach gave way to Bolton's opponents allowing the relegated team a more leisured approach if a direct assault saw Bolton trailing rather than leading.

Stoke and Swansea. Two peas in a pod.

Stoke were the "route one" team most similar to Swansea in how their preferred style related to match result. The Potters' preferred approach, where prolonged passing sequences were rare, correlated strongly with them not losing. Swansea tried to defend by keeping the ball and it was a bonus if a scoring opportunity arose and Stoke used the long ball to keep the ball as far away from their goal as possible and again were happy to take a goal if the chance arose. In short, they had polar opposite approaches, but near identical tactical philosophies, whereby their on the ball style appeared to be used initially as an extension of their defensive ambitions and the final win/loss/draw records for both teams were almost identical, with draws proliferating.

The MCFC data dump is a great resource that can be made even more productive with a small amount of effort, but care must be taken not to assume that every team is trying to achieve the same goals via the same methods. Excellent ball retention may be essential to some teams, but of little importance to others.

Friday 22 February 2013

Converting Chances Is A Skill.

One of the recurring themes of this blog is an attempt to illustrate that the kind of stats that are usually used to quantify how good or bad a team or individual is are almost always an imperfect snapshot of their true abilities. In this guest post I used as large a sample size as is currently available of shots saved by team keepers to distinguish between teams which habitually signed excellent keepers and teams which did not.

We can use a similar approach to investigate the shooting abilities of Premiership sides over the last ten completed seasons. The number of goals scored by teams has obviously varied over that period. But title contending sides invariably have scored more goals than perennial strugglers or yo yo sides who have regularly swapped Premiership football for a less demanding existence in the Championship. The likely reasons for these differing goal tallies include more successful teams creating more chances and then being more proficient at converting them than are lesser teams. One current debate surrounds the extent or indeed the very existence of the latter cause. In short, is the goalscoring talent distribution so narrow in the Premiership that there is little or not difference between the conversion abilities of individual strikers and by extension the teams they play for.

Natural random variation will exist even in trials where we know with absolute certainty the expected success rate. A simulated, fair coin toss won't always produce 1,000 heads from 2,000 trials, but will bunch around that average. I've just run a mere 20 such trials and have seen a low of 949 heads and a high of 1046. Therefore, if we compare a football team's strike rate from a large number of goal attempts, we should expect to see such variation, even if the strike rate of each team is identical.

The average conversion rate for teams which attempted at least 2,000 shots at goal in the EPL since 2002 was 11.3% or roughly one goal every nine attempts. 19 teams have passed this milestone of 2,000 attempts, with Manchester United topping the attempts table at nearly 6,000 shots and Birmingham creeping in at the bottom with a few hundred more than 2,000.

From a purely visual stand point, we can simulate a typical spread of conversion rates for each of the 19 teams, if we assume that each has an identical true converting talent of 11.3%. As expected even in this state of absolute parity, some teams appear more efficient than others simply because of random variation. One team, Spurs, in this single simulation required an average of 10 shots to score once and WBA and Birmingham were apparently the league's most lethal finishers, needing just over 8 shots to score.

If we compare this idealized, egalitarian fantasy with the reality of the last ten EPL seasons, we find that Arsenal,  Manchester United and Chelsea are the three most efficient converters of chances and Wigan, Sunderland and WBA are the least proficient. Just as importantly, the spread of shots required to score once is wider than our single simulation. Ranging from 7 shots per goal for the very best to over 11.5 for Wigan.

The identity of the best and worst teams, along with the wider spread of shots required to score once goes part of the way to implicating other causes in addition to mere random chance being present when deciding the actually observed conversion rates seen in the EPL over a decade. Arsenal for example have scored 153 more goals than expected compared to a side converting at the league average from their 5271 shots

We can more formally calculate the size of these other "skill" factors that appear to be present in the real life EPL and use the answers to regress towards the mean each side's conversion rate to obtain a better estimate of their true conversion rates. And the difference persists. The top sides do continually buy players who can convert chances at higher rates than their counterparts in less successful sides. The difference in the number of shots needed to score once ranges from 7 for the best to 11 or more for the worst, even among relatively successful and regular members of the elite flight. If we include the likes of Derby the skill gap widens even more, although the shrinking sample size increases the likely variation in shot quality.

If we regress the simulated trial by the amount required by the distribution of randomly generated conversion rates, we find each "team" in a league decided entirely by luck needs a virtually identical 9 shots to score once. In short, the reality of the EPL over ten years is much more consistent with scoring being part skill, part random variation.

Wednesday 20 February 2013

Do Early Risers See Less Clean Sheets?

Non league Luton Town equalled a near 100 year old record by playing in the FA Cup sixth round on Saturday. Their feat was overshadowed on two fronts. Firstly, Luton had been members of the top tier of English football as recently as 1992, so their presence and progress in England's premier cup competition was hardly unusual. The thought of Luton as a non league side still requires a quick mental double check for many football fans. And secondly, by pairing them with Millwall, media coverage inevitably focused on their notorious meeting in 1985 when football violence was at it's most extreme. Understandably, the police insisted on an early kick off time, reasoning that crowds are more docile earlier in the day. This tactic, combined with a massive police presence and a present nearly thirty years removed from the dark days of the mid '80's led to fewer than ten arrests.

Read Sean Ingle's excellent piece on the 1985 game here.

The game last weekend went largely to form. Visiting Millwall made their vastly superior league position count, as a late goal from N'Guessan added to two first half goals to give the Lions a comfortable passage to the quarter finals. The final 3-0 scoreline, however, did little to settle the debate surrounding early kick offs in general. One season can see such games produce soporific, goal droughts (in unrelated news, Fulham entertain Stoke in next Saturday's early match), while at other times, early risers are treated to a goal feast.

If kick off time is a component of how goal laden games are likely to be, it is unlikely to be the major factor. The quality of each side greatly determines the amount of goals you are likely to see scored, with the number rising, albeit gradually, as the game becomes more lopsided. If you pitch a typical Premiership title winning team at home to a typical relegation candidate, then the chances of seeing three or more goals in the game rises well above 50%. But reverse the venue and allow home advantage to bring the sides closer together in terms of ability and the possibility of three or more game goals recedes to below 50%.

So it is desirable to include a component for team quality in any study undertaken to try to isolate factors which may influence match goal totals. I've used each team's success rate over the previous 34 games to account for team quality, together with the proportion of the day which has already elapsed at kick off time for every Premiership match played over the last five completed seasons. Typically EPL games have just greater than 2.5 goals scored on average, so a sensible choice is for two goals or less to represent low scoring matches and three goals or more for high scoring ones.

Respective pregame, team quality does provide a statistically significant indication of whether game with tend towards the higher or lower end of the scoring spectrum and confirms that more lopsided games produce more overall goals. If we now add the component of kick off time, it does appear to tweak the prediction towards earlier kick offs having slightly more goals. A noon day kick off between a typical top six side hosting a mid table team sees about 54% of games go over two goals compared to just 51% of games kicking off at 8 pm. These results are consistent with results from Omar's 5 Added Minutes blog. However, as Omar also found, the time component of the regression isn't statistically significant. The effect is very likely due to chance.

It is only when we begin to look at clean sheets and start time that we may see an effect of the widely accepted cause of differing levels of athletic capability being associated with different times of the day. Clean sheets are intimately connected to team ability and even a standard Poisson approach to expected goal rates generates a reasonable fit to reality.

A regression of the pregame quality of each opponent produces similar estimates for the frequency of clean sheets in matches, but adding a coefficient for kickoff time alters that frequency around the average. Early kickoffs are less likely to see a home clean sheet than are later ones, even after team quality is accounted for. Unlike total goals, in the case of home clean sheets the kickoff time coefficient is statistically significant. The effect is unlikely to have arisen through chance.

If the effect is real, it is tempting to attempt to explain why it arises. Early kick offs tend to be televised proportionally more often for the domestic market, so a player may be keener to claim a late consolation goal in a losing cause to impress the watching millions. Alternatively, footballing skills have been shown to be more developed later in the day, so we may be seeing an effect that tells us something about the interplay between defensive and attacking play. Or it may just be a quirk of this sample of nearly 2,000 matches.

An athlete's daily body clock undoubtedly influences performance. The slow starting "morning" teams from the NFC West visiting a "mid afternoon" Atlanta team in the NFL playoffs, provided a graphic illustration of the wider, general struggles experienced by teams crossing multiple time zones to fight their battles. EPL teams never experience a domestic time zone premium, but two teams contesting a match where both are at their athletic peak, may produce a slightly different type of contest to one where both are nearer to their daily trough. And that difference may be picked up in the final scoreline.

Friday 15 February 2013

Shot Conversion Rates, Time Spent Leading and Red Cards.

Parity is a largely alien concept for the very best teams in Europe's major football league. American sporting structures such as the NFL embrace the concept of equality of opportunity, even if a few teams occasionally manage to remain successful for longer periods of time, through fair means or foul. A quick head count shows that in this century virtually every NFL side has made the post season and the largely unremarkable NFC West has seen all four of its teams reach the Super Bowl. Any Given Sunday translates quite nicely to Any Given Season.

By contrast the Premiership has been the preserve of a largely unchanging group of four teams, headed with consistent predictability by Manchester United. Their season long success rate is currently over two standard deviations above league average and this indicator of supremacy over their rivals rarely drops below one and a half times better than par. Similarly, Real Madrid won La Liga last term with a win or draw success rate which was almost 2.5 standard deviations greater than league average, with Barcelona following a respectful 2 sd's back as runners up. Spain has increasingly become a two horse race since 2007/08 and the best of Spain is becoming more dominant in their sphere than are the best of England in theirs.

Talent is the obvious defining factor which separates the top teams from the rest. If luck were a major contributor between reasonably matched sides, we would expect to see greater churn among the top finishers. Since 2000, nine different NFL sides have lifted the Super Bowl, with only the Patriots, who combined astute coaching with Spygate having triumphed more than twice. Over the same time span, the Premiership has been won by just four teams, including seven times by United and that sequence of exclusivity will remain intact this season.

Identifying and grading talent requires copious amounts of data if we are to attempt to separate the output due to randomness from that due to skill. The goal scoring exploits of Ronaldo and Messi in Spain and van Persie in England combined with the large monetary value placed on their services appears to indicate that scoring ability is an area where skill proliferates. Scoring efficiency and hence goals goes a long way to producing a successful team.

We can try to separate the great scorers from the merely good by various means. Quantifying the expected number of goals scored by a striker and by extension a team, based on shot location can begin to tells us much about the team or individual quality. However, the approach is very data intensive.

Accumulating large numbers of shots without including positional data can also provide excellent information. The hope is that sheer weight of numbers leads to a similar overall quality of opportunity, especially at a team level. Each attempt on goal can then be treated as a trial which is either successful because a goal is scored or not. By reference to both sample size and average shot conversion rates across the league we can then see if the different team conversion rates differ by more than would be expected purely by random chance. We may choose to assign any difference to non random factors, such as skill or the lack of it and tentative initial studies appear to show that increased shooting efficiency is present within successful teams and sought after strikers.

Broad, season long trends are of course useful, but we can try to gain a more intimate understanding of the dynamics of a football match by looking a data from a game level to see how teams cope with the inevitable changes in game state that occur from match to match. If clinical finishing is a skill, a player may demonstrate that talent more effectively in different game states. A team may have taken the lead because they have more efficient scorers, but they then become even more efficient as their skill players are able to operate in a scoring environment where the opposition are prioritizing attack over defence.

EPL teams play a near identical schedule (they can't obviously play themselves, so United have an easier schedule than QPR), but within this relatively unbiased fixture list, a side will experience a much more varied in game state. Even the most committed of defensive, bus parking exercises will eventually have to give way to more adventure if the scoreline dictates and that opens up play at their defensive end of the pitch.

We can try to demonstrate the effect of game state on likely conversion rate in  single game by plotting game state against shot conversion rate. On a match by match basis game state accounts for 24% of the total variance in conversion rate. If we express this in a way that is more applicable to a real life match situation, should a team increases it's game state by one standard deviation of the league average, then their conversion rate would, on average increases by around 49% of the standard deviation of all game by game shot conversion rates for that particular league. In short and irrespective of the possibly competing correlation directions, the longer you lead, the better your single match game state will tend to be and your shot conversion rate should follow this improvement.

We can now follow the chain of evidence as to why the very best may be extremely efficient at certain important on field actions, such as shot conversion. They initially purchase talented strikers (different levels of shooting talent appears to exist among strikers), they then find themselves in strong in running positions, which then opens up further their attacking options as their opponents become less defensive. There is, however another significant, minority factor which contributes to enhancing the strike rate of the very best and that is red cards.

30% of the red cards shown last year were shown to opponents of the big four and when the best sides are given the added advantage of a numerical advantage, their conversion rate increases again. The shot conversion rate for Arsenal, Chelsea and the two Manchester clubs where 11 played 11 was around 14%, but nearly 20% in red card games.

The best appear to have high conversion rates because of great players, favourable match environments and a disciplinary system which rewards the best by more frequently reducing the numbers of the rest and while their season long rates will fluctuate, it is to be expected that on average United, especially will maintain a healthy gap between themselves on the summit and the mere also-rans.

Wednesday 13 February 2013

Quick And Easy Game States For Football.

One of the more glaring omissions in attempting to make sense of the huge increase in available football data relates to a lack of context. A priority during one stage of a match may become less so as the game progresses and often the driving force for change will be the game state. The balance between defence and attack will shift with changing scorelines, time remaining and the relative abilities of the competing sides.

In this post here, I looked at how shooting efficiency, frequency and the identity of the shooter and type of goal attempts changed with changing game state. Arsenal's shooting was more efficient, less frequent and more confined to recognised goalscorers when they held a comfortable match position, compared to less efficient, more frequent and more evenly spread among defenders as well as strikers, when they were trying to recover from a losing or drawing position.

Analysing a single team for one season was relatively data intensive, requiring time stamped goal attempts, as well as regular in running calculations of the individual game state positions for the team. Arsenal are of course a successful side, so with a few exceptions, if they are trailing or even simply drawing during a game their current game state will be below their expectations for the game result as a whole. Therefore, they will have the desire, but much more importantly the ability to try to alter their current situation for the better. How they attempt to recover should be reflected in the change in simple in running stats, such as goal attempts or corners won.

Deducing game states for the very best teams is fairly easy without the need to calculate in running goal expectancy for both sides, then relate that to time remaining and current score and compare their current match position with their hopes before kickoff. In short, if they are trailing or drawing, the very best are probably under performing and will be dissatisfied with their current game position.

However, it is less clear if say Wigan are in an agreeable position or capable of improving their lot by referring solely to the current score. In this post I showed that Wigan are more likely than usual to score if they trail, but more likely to concede than usual if they lead. Losing is obviously bad and therefore encourages sides to try to level the game, partly by increased effort and partly by taking more risks and the same situation applies to their opponents when a side such as Wigan lead. But when the game is level and involves non big four sides, it is much less clear where the incentive to attack or defend currently lies. To estimate which team may be driving for a win and which will be happy with a point, we need to go back to calculating regular game states for both sides.

Short cuts are always welcome, as long as they preserve the essential ingredients of the more labour intensive study. In this post  I showed how the pregame supremacy estimates are strongly related to the time a side will expect to spend leading, drawing and trailing in a match. So, if we use in running success rate, described here as a proxy for how the game actually went for a particular team and compare it to the pregame supremacy prediction expressed in a similar format, we can produce an informed guess as to how the game panned out for each team through the lens of actual game states compared to pregame aspirations.

For example last season Blackburn visited Old Trafford in a game that Ferguson would dearly want back. Unsurprisingly, United were strong pregame favourites and were given around a 83% chance of winning and 12% for the draw. In the format of success rate, where a team is given half credit for a draw and full credit for a projected win, that equates to a pregame projected success rate of 0.89. The reality was very different, Blackburn led for almost an hour, drew for just over half an hour and United never had the chance to lead, for an in running success rate from United's perspective of 0.19. A comparison of these two figures immediately tells us that United spent much of the time chasing a game and two goals from 27 shots appears to confirm this view.

As with Arsenal, this case is self evident, but the method allows us to tease apart the likely flow of attack and defensive contests in much closer match ups. This approach of comparing expectation with reality, may provide a quick, but reasonably representative way to add game state context to a multitude of stats, ranging from shot and save percentage to proportion of corners, without sacrificing the merits of the more detailed method involving repeated, team specific calculations.

To test this model, I looked to see if the league as a whole follows the Arsenal trait of having more frequent, but less accurate attempts overall in matches where they are likely playing catch up from their pregame expectations. I plotted shooting efficiency against the amount of deviation in actual in running success rate compared to pregame hopes and the trend appears to be present league wide. When likely trailing against expectation, in general, shots are less efficient, presumably as attempts become more speculative, against more concentrated defenses and from less able striking talent. R^2 is 0.17, which is huge for data points comprising individual games. R^2 is a hostage to sample size, and when sample size is small, random variation predominates, R^2 doesn't always need to be large. It too must be given context. Which is where we started this post.

Tuesday 12 February 2013

The Value of an Away Goal In Madrid.

The Champions League returns from a winter break with the first round of the much anticipated knockout ties beginning this week. The recent domination of the competition by the richest and most powerful European countries is again reflected in the make up of the draw of the last sixteen. Traditionally the competition has fallen to sides from either Spain, England, Germany or Italy. You have to go back to 2003/04 to find a winner, Porto, which didn't play in one of the big four leagues, followed by a similar gap to 1994-95 when Ajax lifted the trophy for Holland. Indeed, the last eight finals have been exclusively the preserve of this small elite, with eight finalists coming from England, three each from Spain and Italy and two from Germany.

This year, 11 of the 16 survivors from the pre Christmas group stages hail from either Spain, England, Italy or Germany. It is therefore a simple task to pick out the three outstanding ties of the round, as clubs from the big four countries meet head to head. Arsenal v Bayern Munich and AC Milan v Barcelona should provide a great two legged spectacle, but much of the focus will fall on the meeting of Real Madrid and Manchester United, Ronaldo verses Rooney and Ferguson verses Mourinho. Both sides have enjoyed impressive performances in Europe's premier club competition over the last two decades. Madrid's three titles eclipses United's two, but the English side has seen much more success in the last ten seasons, winning the trophy and appearing as beaten finalists, while their rivals on Wednesday night have constantly failed to progress through the knockout phases.

Recent history notwithstanding, Real Madrid are favoured to progress from the tie. They are given about a 60% chance of qualifying for the last eight and their quality is also reflected in the overall tournament odds. They have a 17% chance of being crowned kings of Europe, behind tournament favouites, Barcelona, but well in front of their immediate opponents, Manchester United, the bookmakers fifth best choice with little more than a 7% chance.

The format of a competition can often influence the likelihood of the best sides ultimately succeeding. A prolonged group phase based on individual home and away games, where two of the four teams progress virtually ensures that a high proportion of the best sides emerge intact for the knockout stages. Once the group stages start, the two legged nature of the ties also plays towards favouring the stronger teams. However, the treatment of away goals is an often neglected aspect of the Champions League format and it can have a major impact on the ultimate outcome of a tie.

Chelsea progressed to the final last season with a 3-2 aggregate win against a much more fancied, and some would say more accomplished Barcelona side. But the victory owed much to the clean sheet kept by Chelsea at Stamford Bridge and the away goal scored by Ramires just before half time in Spain. Torres' last second, tie winning goal merely rubber stamped a result that was already assured by the doubling of the away goal already scored in the game.

United find themselves in a less extreme version of Chelsea's match with Barcelona. They are underdogs in the tie, so it is an interesting exercise to see how vigourously Ferguson should try to score one or more away goals and equally how wary Real Madrid should be in case they concede what may appear to be a mere consolation goal, but in the wider context of the tie may prove to be worth much more.

Real Madrid host the first leg and so will be strongly favoured to take a lead to Manchester. However, the current Champions league format applies an away goals rule, whereby away goals count double if the tie is level at full time in the second leg and then again at full time of extra time. So we can use the relative merits of United and Madrid  to compare each teams chances of progressing if Madrid take identical winning margins into the second leg, but do so having conceded differing numbers of away goals.

A simple 1-0 win for Real Madrid, unsurprisingly would make them very strong favourites to progress. A draw of any kind in the return leg at Old Trafford would then be sufficient, as would a win. Defeat by exactly a single goal, provided they scored two or more goals would also see them progress under the away goals rule. Only a repeat of the 1-0 scoreline, only this time in United's favour after 90 minutes would be sufficient to take the game to extra time. Extra time would then give Mourinho's side the option of winning in the extra 30 minute period, drawing the extra time period by a score other than 0-0 and thus progressing again through away goals or if extra time remains scoreless, Ronaldo would have the opportunity to win the game for his side in a penalty shootout.

In short, by keeping a clean sheet on Wednesday night, even if victory is by the narrowest of margins, Real Madrid give themselves a chance of progressing in every scenario where they lose in Manchester by an identical margin to their victory in the Bernabeu.

How A United Away Goal would Alter Madrid's Chances of Qualifying for the Last Eight.

1st Leg Real Madrid Scoreline.  Chance of Real Madrid Progressing.
1-0 78%
2-1 68%
3-2 62%

However should United keep the margin of defeat the same, but manage to grab an away goal, some of the winning margins which would have seen United triumph in the second leg, but be eliminated by an away goal or two are now stacked up to United's advantage. If United lose 2-1 in Spain instead of say 1-0, a 1-0 win at Old Trafford would suffice for Ferguson's side. That scoreline no longer heralds a minimum of an extra 30 minutes and multiple further opportunities for the second leg visitors.

High scoring encounters make for great entertainment, but they are invariably bad news for the first leg home team in a competition which uses the away goals rule. A cautious approach from the hosts should be expected.

How Different Drawn Scorelines Change the Dynamics of a Two Legged Tie.

1st Leg Real Madrid Scoreline.  Chance of Real Madrid Progressing.
0-0 57%
1-1 46%
2-2 37%

The same pattern is seen in drawn games. A goalless first leg wouldn't be a huge blow to Mourinho because it would mean that any draw after regulation time in Manchester would either prolong the tie or hand outright victory to his side. His side would probably still be favoured to win the tie outright. However, that position quickly deteriorates as the first leg result sees more goals, even though the host aren't beaten. At 2-2 after the first 90 minutes, only the unlikely 3-3 or above stalemates are good enough to see Madrid through after 180 minutes or more of football and all other drawn scorelines either see United through or still very much alive in the tie.

An adventurous approach from the visitors in the first leg often reaps rich reward at the conclusion of the tie.

Wednesday 6 February 2013

The Cult Of Passing.

Law 10 of association football is quite straightforward and states that the winner of the match is the team scoring most goals. If both teams score an equal number of goals or no goals are scored, then the game is drawn. There is no mention of extra credit for prolonged bouts of possession or passing. Therefore, although both possession and passing are important descriptive aspects of the game, they are merely a means to an end and that end is scoring a goal.

It has been recognized that possession is much less important than what teams do with that possession. The game winner can easily lose the possession battle and often equality of possession is not reflected on a lopsided final scoreboard.

With the demise of possession, it would appear that passing has now replaced it as the desirable stat of choice.

Few would expect Barcelona to raid the Liberty Stadium in an attempt to prise Leon Britton away from Swansea as a natural replacement for Xavi, but as recently as last week Britton's passing percentage was again being used as evidence of his supremacy in this most unnatural of match ups. Britton does most of his admirable passing a good Dan Biggar conversion away from the goal and conversely, Xavi operates in a much more elevated pitch position. Yet still the comparison between the two persists. Without context, the comparisons are meaningless and with context, the differences between the two players become apparent.

Having confused a creative, central midfielder with one who merely plays the simple pass, often from no deeper than the middle third, passing is now also being further used to characterize whole teams. The best teams, in general tend to pass more often and with greater frequency and these two attributes are highly correlated. If we now plot either passing frequency or efficiency against a measure of success, such as points per game, the case for passing appears complete. The better you pass, the more points per game you win.

The circle is complete. Passing is an admired art form practiced by the elite, they reap the rewards of excelling at the skill, while poor passing sides languish in mid table or worse. The implication is clear, to improve sides must become better passers of the ball. However, as with the repeating Xavi verses Britton standoff, the team passing debate lacks all important context and also suffers in the case of some teams from a confusion of correlation verses causation.

In the previous post I introduced in running success rate, a shortcut to adding context to a single match result. It's an offshoot of season long success rate, where the percentage of wins and half draws replace the arbitrary system of three points for a win and one for a draw. A goalless game sees both teams stalemated for 90 minutes leading to a success rate of 0.5 each ((90/2)/90). In similar vein a team winning 2-1 after conceding early goal and then responding with two late goals would have in running success rates of near zero compared to near one for a side which had an identical 2-1 win, but lead from the first minute to the last.

In running success rate adds information in a single number about the time a team spent either trailing, winning or drawing in a single game and that helps to add context to the stats each team recorded over that game, including passing stats.

If we now plot in running success rates for individual games for each club, as a proxy for the differing game states and scorelines each team experienced against their passing efficiency above or below their normal average in those games, we can start to see how important passing is to each team's ultimate aim of winning league points.

As usual, Stoke are the poster child for atypical behaviour during their time in the EPL and once again they buck the Barcelona driven expectation that passing the ball well should bring expected rewards, because the reverse appears to be true for Stoke. The Potters passed the ball more efficiently in games where their in running success rate was at it's poorest. The expected good passing premium doesn't apply to Stoke and on this occasion it also doesn't apply to others as well. Teams who pass better than they usually do when they are more likely to be in losing positions, indicated by low in running success rates, included from 2011/12, Aston Villa, Sunderland, WBA, Fulham, Newcastle, two of the three relegated sides, QPR and....Swansea.

It's easy to weave a plausible, but at best only half true explanations for this reversal of Barca model, but the assumption that better passing always gets it's reward and the arrow of causation always goes from better passing towards better results would seem to be flawed.

Stoke absorb pressure, then try to get into scoring position with low expectation, long passes. If they get ahead they then try to hold that position by absorbing even more pressure and use long balls as an escape route. So in advantageous game positions their passing efficiency starts low and stays low. If they go behind, they aren't suited or particularly able to pass their way through a now more defensively set up opponent, who are also happy to allow Stoke time on the ball in non threatening areas. Consequently, Stoke see more of the ball than usual in the middle third when their in running success rate is poor, but they are able to rack up higher, but ineffectual passing efficiency rates because of this concession by their opponents. In short, game situation sometimes drives Stoke's passing stats, rather than vice versa.

The Barca passing model doesn't fit Stoke's approach to winning points and it doesn't fit the approach of other teams, including Swansea whose incessant passing would appear to be less effective at breaking down more defensively minded, first scoring opponents. A model relating passing to points is too simplistic to capture the diverse way in which teams attempt to get the best from their available talent. A diversity which Law 10 encourages by making goals the only reckoner.

A Tale Of Three Draws.

This post relates to three games played last mid week, but it also serves the purpose of introducing a way to add context to a single match scoreline, particularly in terms of how long teams are behind, in front or level during a single match. Look out for a (very) quick follow up post.

Football cliche met football myth over the recent series of weekday Premiership matches as various teams relinquished or surrendered that most dangerous of two goal leads. It is certainly rare for a team to fail to see out a two goal advantage and rarer still for three games out of a ten match Premiership schedule to feature such comebacks. But rare events do occur and sometimes they clump together.


1-0, Shawcross, 23'
2-0, Crouch, 48'
2-1, McArthur, 51'
2-2, D Santo, 61'


0-1, Suarez, 5'
0-2, Henderson, 61'
1-2, Giroud, 65'
2-2, Walcott, 67'


0-1, Mata, 45'
0-2, Lampard, 66'
1-2, L Fondre, 87'
2-2, L Fondre, 90+4'

The expected points graphs above, show the magnitude of the task faced by first Wigan, then Arsenal and finally and most improbably, Reading. Chelsea were the strongest favourites of the three teams which took a two goal lead and the lateness of Reading's response meant that their in running expected points total had almost disappeared off the foot of the chart and still remained tiny even after Le Fondre's first strike in the 87th minutes.

Three points for a win compared to just one point for a draw, invariably makes the trailing side appear as "winners", even though the spoils are shared and much of the good work of the teams which scored first is forgotten in the dramatic excitement of the comeback.

Stoke lead Wigan for 42 minutes at the Britannia, Chelsea led Reading for 50 minutes and Liverpool gave their supporters real hope of taking all three points for over an hour. And it will be the supporters of those three teams who probably viewed the final stalemate with most disappointment. We can quantify the in running match situation by crediting the time spent leading to each team and also sharing between opponents the time spent at level terms.

Team. Time Spent Leading. Time Spent Drawing. In Running Success Rate.
Stoke v 42' 55' 0.72
Wigan. 0 55' 0.28
Arsenal v 0 32' 0.16
Liverpool. 65' 32' 0.84
Reading v 0 47' 0.24
Chelsea. 50' 47' 0.75

Liverpool had the highest "in running" success rate at 0.84, so the draw and the loss of two points at full time was probably more keenly felt on Mersyside than in the Potteries, where Stoke only lead for 42' in recording a success rate of 0.72.

It is tempting to consider drawn games as evenly fought contests, where the spoils are rightfully shared. But as these three rather extreme example show, often one team holds the advantage for a considerable portion of the game. Match situation can often determine how sides approach the remainder of the contest and this can have implications for stats, such as where and how effectively teams make their passes. In a follow up post, I'll use in running success rates for individual matches to see if such figures can help us to better understand pass completion rates and help to explain why trailing sides sometimes become much better at passing the ball.