Pages

Thursday 30 January 2014

Chelsea v West Ham. Not Quite A Typical 0-0.

In this blog, I've taken two slightly different approaches when looking at how game state alters the way teams balance their risk and reward approach to periods during a match. You can either primarily look at the best sides, which allows you to assume that a tied scoreline is almost always unsatisfactory for them and therefore scoreline can largely be used as a proxy for game state. Or more recently I've concentrated on games that remain scoreless and therefore, pregame odds can be used as a proxy for the overall game state experienced by each side.

On Wednesday night both of these approximations aligned as near top, Chelsea beat near bottom, West Ham 39-1 in shots, but only drew 0-0 on the scoreboard.

Chelsea were unsurprisingly strong pregame favourites, the hosts having about an 80% chance of winning and a 15% chance of drawing for an overall match success rate of 87.5%. So the longterm points expectancy for the title challengers from such a match was 2.55 league points, because they would win 80% of the games, gaining themselves 3 points and draw for a single point 15% of the time, leaving 5% left over for a shock away win.

The longer the game remained stalemate the further the expected reality on the night fell away from this hoped for average expectation. Goal expectation decays relatively slowly at first. As the clock ticked into the 40th minute, Chelsea could still expect to take around 2.3 points from West Ham, even though they had yet to make a breakthrough. Their points expectation after 40 minutes was still over 90% of what it had been at kick off. So there hadn't been particular cause to panic through the bulk of the first half.



 Above, I've plotted (in red) the rate and extent of Chelsea's declining pregame points expectation. 40 minutes in, as already mentioned, it had fallen to 90% of the original value at kickoff. But by the 90th minute, it had very nearly halved to 1.33 expected league points.

Superimposed on the graph I've included the goal expectation for Chelsea from each of their 39 goal attempts, based on the actual x,y location from where the attempt was taken. I've grouped the attempts into 10 minute slots.

Data from a single match is inevitably choppy, combined with sides perhaps playing in spurts of increased effort, rather than a smooth gradual cranking up of the pressure. However, the trend for Chelsea to become more intent on making a breakthrough appears to increase in tune with the decline of their game state. They produced enough individual goal attempts to find the net an average of nearly 3.3 times over the 90 minutes and they threaten more in general as the game wore on. Such was the extent that WHU became entrenched, that Chelsea's goal expectation from their actual shots taken in the second half approaches nearly three times their value from before the break.

The 39/1 shot ratio was exceptional. As noted here, superior sides, on average also have the lion's share of shots when a game ends 0-0. But typically, a side as superior as Chelsea are compared to WHU in terms of league placing should only claim around 72% of the total shots taken in the game.

Other splits, however were more typical, although they still indicate the excessively above normal rate at which Chelsea may have chased and WHU have hunkered down. Chelsea had 70% of the crosses compared to an expected 63%, and 74% of the total passes against an expected 62%.

Although the massive shot differential takes all the headlines and other attacking based ratios also indicate the severe imbalance between the mix of capability and intent on show from Chelsea and West Ham, the underdogs did have a minor "success" in the way clearances where divided on the night. Such large favourites might expected to only have to account for around 20% of the clearances made in the game, but WHU managed to force Chelsea into making over 30% of Wednesday's total.

Statistics for a single game are often determined as much by the in game situations each side is presented with, as by the relative gap in quality. Mourhino's side played a similarly limited side in Stoke on Sunday in the FA Cup. They didn't quite dominate The Potters in terms of shots, as they had done WHU, but they didn't need to, following Oscar's first half goal. After that it was just a case of keeping Stoke's relatively well disguised attempts at equalising at arms length and picking away with the regular opportunities that were available.

On Sunday, six goal attempts inflicted on WHU in injury time alone, simply wasn't called for against a Stoke side still playing possession football in their own half. A team produces a combination of what it can and what it needs to do and very occasionally, when the expected doesn't happen, one of those ratios goes off the scale.


Tuesday 28 January 2014

How Rugby League Teams Win and Lose

This blog occasionally covers sports other than football, so following on from Sunday's look at the relative abilities of NFL and rugby union kickers, I've plucked some low hanging fruit from union's first cousin, rugby league.

For those more familiar with union because of the extensive media attention it receives compared to league, I'll run through some of the differences.

Union and league split from a common ancestor at the dawn of organised sport and while the present day sports are visibly related, the differences are far deeper than simply the head count. 15 players for union, two fewer for league.

The tackle area is where the deepest division has occurred. Union allows for players to compete for possession, whereas league quickly ends the play, invariably allowing the team in possession to continue their attack and instead relies on a tackle count, similar to the four downs in the NFL, as a way of turning the ball over to the defence.

Scrums are contested, sometimes interminably in union, but they merely provide a means to restart the game and free up space in league.

The method of scoring is where a shared heritage is most evident, tries and kicks are identical under both codes, although inevitably the points awarded varies.

It has been off the field where the biggest interaction and division between the two closely related sports has occurred, especially in the modern era. League embraced professionalism by paying their players, while union maintained an amateur ethos, in theory, if not entirely in practice.

Therefore, movement of talent, primarily from union to league in return for a wage and a threatened lifetime ban was a feature of the two sports in the latter part of the last century. However, once professionalism became inevitable in union as well, the talent drain was largely reversed.

Home unions paid out large sums of money to attract league talent, such as England's pursuit of Andy Farrell, although the skill premium from the deal largely skipped a generation to his son, Owen. On the pitch, at least.

As with NFL and union, kicking allows for the most natural comparison between the two codes of rugby, assuming that data is available.

Stephen Myler came from a long line of distinguished league players. But swapped Widnes Vikings and Salford City Reds not for the Saints of St Helens, but the Saints of Northampton in union's Aviva Premiership, where his kicking ability and game management with the boot is perhaps better suited.

Northampton's Stephen Myler prepares to impose league bred tackling on the Lion's captain. 
Myler the union kicker verses Myler the league player, will have to wait until I've collected enough data. In the meantime, data for rugby league is as rare as it is for union. Although, the small amount available on the net does have relatively large counts, at least for individual teams and much of it can be converted to rate statistics.

For example, the number of clean line breaks per carry or the more familiar conversion rates for kickers.

If we assume that sample size can at least partly smooth out the lack of detailed information, such as pitch position and opposition strength, we can attempt to answer three important questions relating to such data.

Firstly, is there a difference in the rate figures for on-field actions across teams. Next, how do these figures correlate with success over a season and finally, are the figures broadly repeatable for sides from season to season.

For example, over the last five completed season, Super League sides attempted a combined 378,500 carries, producing 10,553 clean breaks. Overall, a league average 2.8% of carries resulted in dangerous, clean line breaks.

Even if each side was equally adept at making clean breaks, the percentage of such breaks wouldn't necessarily be exactly 2.8% per team over a season. Random variation and different numbers of carries would combine to give a spread where transiently "lucky" sides were above the average and "unlucky" ones ended up below par.

So a simulation of the percentage line breaks obtained from typical carry numbers, but assuming each side was equally talented, will show still show a spread of results.

However, the reality from the previous 5 Super League seasons shows an even bigger spread compared to such simulations. Therefore, we can probably conclude that making a clean line break in rugby league is a talent that isn't shared equally across all 14 Super league teams.

The best exponents of this skill over the last five seasons was the 2012 Wigan Warriors with a likely true clean break rate, once random variation is stripped away, of just under 9 per 200 carries and the 2010 Castleford Tigers the poorest with just over 3 per 200 carries.

The skill or lack of it is also reasonably persistent from one season to the next. R^2 values for year n against year n+1 for the percentage of line breaks per carry is around 0.25, so 25% of the variance in clean breaks survives from one year to the next.

Lastly, the skill is also reasonably strongly correlated to success over a season. A side with a greater percentage of clean breaks can also expect to have greater numbers of wins.

In short, producing clean breaks in rugby league is a talent that is unevenly spread across Super League sides, it partly persists across seasons and correlates well with success. Overall this suggests the rather obvious conclusion that players that can produce line breaks are a valued rugby league asset.

We can use this technique to look at other on field events. Offloads, for example, where a player passes to a colleague as he is being tackled can produce big gains in territory as defenders are committed to the tackle area and may relax in anticipation of the tackle being completed.

The spread of the rates at which teams pass out of the tackle is certainly wide enough over the last five seasons to suggest that the observed rates aren't the product of natural variation around a common mean. Tactically, or through desperate necessity some teams appear to attempt this high reward, but high risk play at higher rates than others. It is also a tactic or play that sides stick with from season to season, but it is also completely uncorrelated to success or failure in terms of match results over the season.

Below, I've listed a few other match events from rugby league and how they each match up in terms of likely skill differential, repeatability and correlation to success.

Rate of On Field Actions. Size of the Rate Difference seen between sides. Reproducable Over Seasons. Correlates to Winning/Losing.
 Missed Tackles. Very Large. Reasonably Strong. Fair Correlation to Losing.
Runs From Dummy Half. Very Large. Reasonably Strong. Uncorrelated.
Offloads in Tackle. Large. Strong. Uncorrelated
Clean Breaks. Reasonably Large. Reasonably Strong. Strong Correlation to Winning.
Goal Kicking. Small. Weak Uncorrelated.

Preventing line breaks and the importance of an extremely resilient defence appears to be highlighted by the correlation of missed tackles to losing and clean breaks to winning. It is also worth noting that the most significant league to union cross over of recent years was the success enjoyed by Shaun Edwards, primarily as a defence coach, first at Wasps and later with Wales. His ideas have transformed the teams he has been involved with.

Although there will always be ebb and flow, the influence of league over the way union has developed in recent years would appear to be significant.

Sunday 26 January 2014

NFL Kickers Verses Rugby Kickers.

The 2013 NFL season pauses for breath with the Pro Bowl, before two former AFC West rivals, Denver and Seattle face off in Super Bowl XLV111. NFL Commissioner, Roger Goodell also used the week gap between the season's most meaningless game and the most important, to propose the elimination or replacement of the kicked extra point following a touchdown. Unlike penalty kicks in football, extra points are getting much too easy to execute. Only five were missed this season and conversion rates easily tops 99% over recent seasons.

Fans and coaches, alike appear to be in general agreement with Goodell (kickers are less keen), although alternatives are naturally both numerous and varied. One, inevitably given both sport's common origin, harks back to rugby, where the kick is taken on a line back from where the ball is touched down. Logistically and from a game play view, applying rugby's solution to the NFL's kicking problem would create problems.

A touchdown scored at the pylon would require a kick from the touchline. The optimum distance from the posts for such kicks chosen by the rugby kickers is around 25 yards, so the long snapper in the NFL would have to get longer and half the line would line up out of bounds. Most pertinently, the rate of conversion from the touchline in rugby is just below, 50% giving an expected points value of below half a point for such attempts. 2 point conversions (if they were retained) are converted at a similar rate, so their expected points would be over twice that of a kick.

A team would never attempt the more difficult kick.

A rugby crossover is therefore unlikely. However, it does rekindle the debate over which sport has the best kickers. The initial variables are relatively close. Rugby and gridiron posts are of almost identical dimensions. The balls are of similar weight, although a NFL ball is more designed for its primary use as a missile. Special kicking balls possibly redress this aerodynamical disadvantage and such balls are only lightly greased when Tony Romo is acting as holder.

Up to now, we have a relatively level playing field. The major advantage given to kickers in the NFL over their rugby playing cousins is they always kick from a relatively central position. Union and league insist on conversions being taken from the aforementioned line from the point where the ball is grounded and penalties are awarded where the infringement occurred.

Therefore, overall conversion rates of around 70% for rugby kickers aren't directly comparable to the 80+% percentage of successful (non extra point) field goals achieved by the NFL's 32 regular kickers.

Gone in 60 seconds. Dan Biggar kicks another 3 points for the Ospreys.
In order to eliminate this bias caused by the difficult kicking angles faced by rugby's kickers, I looked just at penalties and conversions attempted within a couple of yards of the centre of the posts. Kicks were drawn from both hemispheres, across club and international fixtures and included all of rugby's current best footballers. Carter, Halfpenny, Farrell, Wilkinson etc.

NFL data comprised field goals from the last four completed seasons. The majority of extreme distance kicks from the NFL came with seconds remaining either in a half or a game. 24 of the longest 30 attempts had 15 seconds or less on the clock, so I eliminated these, much in the way Hail Mary's should perhaps be struck from the passing stats. "S Janikowski, 76 yard field goal is NO GOOD" is hardly a typical play, even for the Raiders.

 
Above I've plotted the expected conversion rate by distance (not yard line, in the case of the NFL) for both sets of kickers. Once we allow that rugby's raw conversion rate is depressed by the more difficult range of kicks, the respective conversion rates practically converge.

Kicking indoors at the Superdome or at mile high altitude, may be easier than kicking on a wet night at Rodney Parade, but large sample sizes appear to wash out any advantage, especially when Lambeau is included.

Rugby takes a minor lead as the distance between posts and ball increases, but sample size does become patchy towards fifty yards. NFL teams perhaps become wary that a failed attempt gives their opponents excellent field position and rugby can call on the exceptional contribution of Toulon's Leigh Halfpenny.

So with the SuperBowl imminent, along with the start of the Six Nations, both sets of kickers can probably call for an honourable draw. As for the beleaguered extra point, we should really put up with season on season a predictability just occasionally to witness the unfortunate John Carney to do THIS!

Wednesday 22 January 2014

A Use for 0-0's

Sooner or later anyone who regularly watches football will eventually be treated to a scoreless match. From a spectating point of view, especially for the neutral, 0-0's are often an unwelcome addition to their footballing experience, but from a statistical standpoint, stalemates may provide a valuable baseline into the complex, but increasingly relevant subject of game states.

It is quite natural that a side may change tactics based on their current needs and scoreline at a particular phase of a match. A cricket team having wickets aplenty in hand on the fifth day of a test match, with the winning scoreline tantalizingly in sight may take risks to reach the winning total. At least until falling wickets induce a more cautious, draw orientated approach. Such bursts of accelerated scoring, usually also involving increased numbers of falling wickets, are easy spotted in a sport, like cricket where every potentially scoring action is individually recorded.

In football this ebb and flow in the interactions between teams is less easy to define. Unlike many sports, such as cricket, American football, and baseball, were "goal" prevention and scoring occurs in defined periods of play, scoring and attempting to prevent being scored against occurs simultaneously in football. Retaining possession in football can be both an attacking or a defensive action.

Tactical adjustments based around game state, therefore are likely to be as real in football as in other sports, but even overt changes to a more possession based/risk averse approach when leading may be difficult to spot across a match, especially if the opposition quickly changes their own game state by rapidly equalising a go ahead score.

Stoke and Cardiff Prepare to Serve Up a Statistically Interesting 0-0.
Game state is being increasingly used in football analytics and inevitably the phrase may have different interpretations across different sites. In this blog I have described game state principally as the interaction between the team quality of each side taking part in the match, the current scoreline, the time remaining and any dismissals that may have transpired due to red cards. As a consequence, accurately calculating the game state over even a single match, requires constant re-calculation. Some of the inputs may remain relatively constant, but time elapsed is always moving forwards towards full time.

Thus, goalless games, especially where we have more detailed statistical breakdowns of on field actions, provide the easiest doorway to how sides react in certain game states. A side, especially a talented one often only shows us part of what they are capable of, tempered by what they needed to do, especially if they recorded a fairly comfortable win. For example, anecdotally, 2-0 victories increased in international football when the cast and spread of team quality increased in the 1990's as good teams adopted risk averse strategies in the face of relatively unknown, but probably inferior opponents.

If we stick with 0-0 games, with no red cards, played between teams of known quality, the only major contributor to changing game state that remains is time elapsed. In short, there are no major peaks or falls in game state across the 90+minutes caused by reckless tackles or deflected 30 yarders. Therefore, how game state progresses for such contests is almost entirely a function of the quality differential between the sides at kick off.


























The plot above shows how dominant teams were in terms of collecting their share of the total attacking touches of the ball made in the penalty area, during 0-0 matches from 2011/12. The pregame success rate defines how balanced the match was expected to be prior to kick off and red card matches have been omitted. To anchor an example in reality, Spurs would currently have about an expected 0.8 success rate prior to kicking off at home to Stoke.

The trend is well defined, superior pre game sides had the lion's share of attacking touches inside the penalty box recorded across the 90 minutes. For example, the line of best fit gives a side with a pre game predicted success rate of 0.8 an average of 75% of the game's attacking penalty box touches. The longer the game remains stalemated, the more the game state turns against the pre-match favourite, merely through the ticking of the clock, sustaining their efforts to deliver passes and touches in the dangerous area of the penalty box.

The trend in 0-0's spills over into other statistical categories. Superior teams on matchday, on average, enjoyed majority shares of shots, chances created, dribbles, crosses, final 3rd touches and blocked efforts, allied to reduced levels of clearances compared to their inferior opponents.

In short, these historical rates indicate what level of on field actions a typical EPL side is likely to record in a 0-0 match, where the talent gap between the teams is readily known, without intervention from other major game state changing factors, such as goals or dismissals.

If we now wish to see the direction these shared proportions take as factors other than simply time elapsed combine to change the overall game state experienced by each side, we can look at the next lowest match result. Namely, games decided by a single goal.

A single goal victory will improve the average game state of the superior side compared to an identical match that remains scoreless. And similarly, data from single goal defeats should be characteristic of how matchups perform under poorer game states than those experienced in 0-0 games.

How Proportion of Penalty Box Touches Changes by Result & Match-up.

Pre Game Expcted Strike Rate. Game Result
0-0
1-0 Win. (Better Game State) 1-0 Loss. (Poorer GS)
0.8 75% 67% 76%
0.75 71% 63% 72%
0.7 66% 60% 69%
0.65 62% 56% 65%
0.6 58% 53% 62%

Above, I've charted the proportion of penalty box touches derived from the line of best fit from plotting graphs for matches that ended in single goal wins and defeats, as well as goalless draws.

In games where the favoured team won by a single goal, their proportion of touches in the area declined compared to the baseline figures derived from a 0-0 result, whether through their opponents becoming more adventurous or themselves more cautious. Where the team lost 1-0, they were good enough and needy enough force an increase in their share of such touches compared to the baseline numbers.

As an example, the best fit for a team with a pre-game expected success rate of 0.7, sees their share of touches in the box falls to a low of around 60% when they win 1-0 and scoring becomes less of a priority for part of the match, reaches a high of 69% when they are chasing a one score deficit and is anchored at 60% when neither side finds the net.

There's nothing new in these conclusions. It has been established that a side that performs poorly by their usual standards over a season, tends to accumulate more products of the attacking football they must undertake to rectify matters than they do in better times. Corners are a prime example. But the use of the 0-0 match as a handy baseline may restore a bit of (statistical) love to a usually underwhelming extravaganza.

Monday 20 January 2014

Crystal Palace. Just Like Watching Stoke City.

The sighting of a Pulis led side that was out shot, out possessed and out passed, yet still managed to take all three points was hardly a new experience for the 2,000+ Stoke fans making the trip to Selhurst Park on Saturday. This tactical wrinkle, based firmly around allowing poor sides the ball until they run out of limited ideas, before succumbing to either a set piece score or a mental error worked as well against Pulis' former side as it had against their exasperated opponents during his memorable time spent in charge at the Britannia Stadium.

Post match reaction of fans from the Potteries, following a largely dour and unattractive 1-0 defeat, ranged from the ironic "How can they watch that every week ?" to "I'd take him back tomorrow". All told, not a great away day, as the bottom of the table contracted even further.

It hasn't taken Pulis long to install his preferred approach and the most obvious change can be seen in the frequency at which the players are now making passing attempts compared to their rate of passing first under Ian Holloway and also under interim coach, Keith Millen.

Average Time Elapsed Between Passes Under Holloway/Millen and Under Pulis.

Player. Mins/Pass Before Pulis. Mins/Pass With Pulis.
Dean Moxey 3.0 4.8
Mile Jedinak 1.8 2.4
Damien Delaney 3.3 4.2
Danny Gabbidon 3.3 5.6
Joel Ward 2.4 3.1
Marouane Chamakh 2.4 2.7
Kagisho Dikgacoi 2.1 2.9
Dwight Gayle 2.0 3.7
Jason Puncheon 1.9 3.2
Adrian Mariappa 3.9 3.7
Barry Bannan 2.3 2.9
Cameron Jerome 3.8 4.1
Yannick Bolasie 4.0 4.5

Virtually every Palace player whom has played substantial time in 2013/14 under Ian Holloway/Keith Millen and now under Pulis has seen their playing time adjusted passing rate subsequently contract. It is always possible that random variation across sample sizes, combined with different opponent strength can produce similar effects. However, so extreme has the change been across all players, coupled with Pulis' previous preferences, there can be little doubt that there has been a major shift in emphasis. The chances of both sets of passing statistics being drawn from a common ancestor is remote, the differences are significant.

Palace players are making fewer passes per minute under Tony.

Saturday's match also produced a study of the type and frequency of shots at goal typically demanded by Mark Hughes at Stoke and Tony Pulis at Palace. Despite being comprehensively out shot by 17 to 12, Palace still amassed a slightly better cumulative goals expectation when x,y shot location was accounted for. 1.2 expected goals for the hosts compared to 1.1 expected goals for Stoke.

The culprit and recurring theme for Hughes coached teams was an abundance of long range efforts from the visitors. 12 of Stoke's 17 efforts from distance individually had a (much) less than 8% chance of producing a goal compared to just  just 6 for Palace. The hosts also dominated the high return efforts closer to goal, another Pulis trait carried over from his time at Stoke. Three Palace efforts had an individual goal expectancy of 20% or greater, a chance taking area where Stoke drew a matchday blank.

































Three of Palace's high value opportunities came by way of a triple Jack Butland save, an indication of Pulis' fine eye for a keeper, that often deflects attention from his more scatter gun approach to recruiting successful out field talent during his tenure at Stoke. Therefore, if we wish to evaluate the fairness of the result in light of the shots attempted and conceded by each side, we need to acknowledge that, even in is partly artificial probabilistic reconstruction of Saturday's classic, Palace could only have scored once from this intimately related barrage of rebounds and shots.

Despite the larger footballs in use at Stoke, Assaidi is providing an unsustainable number of goals from distance.
Once the shot locations are evaluated, related events identified and outcomes simulated, the actual result falls well within the bounds of likely outcomes. Palace's slightly superior overall goal expectation is eroded enough by Butland's triple save to make Stoke narrowly the more likely to win a prolonged simulation of the soring chances. A 35% win probability based on shots taken for the visitors barely edges the 33% for the hosts.

A 1-1 draw was the most common scoreline seen in the simulations, closely followed by the actual 1-0 win recorded on the day by Palace. Stoke then pile in with a couple of higher overall total goal victories to edge Palace overall, but despite 29 total shots, the match was very unlikely to turn into the eight goal thriller that Stoke had fought out with Liverpool during round 21.

Pulis' relegation avoidance strategy, honed throughout his managerial career, particularly at Stoke is likely to give Palace a decent shot a staying up. Especially against limited opponents, such as the current, devoid of Fuller-esque pace and guile, Stoke team, against whom his strategy appears to work best.

Saturday 18 January 2014

How Hughes Has Changed Stoke.

When this afternoon Tony Pulis momentarily takes his seat in the home dugout as his new charges Crystal Palace entertain his former employers, Stoke City, he will be much more familiar with the players taking the field for the visitors, rather than for the hosts.

His replacement at the Britannia Stadium, Mark Hughes may have a shared heritage, but his preferred brand of football is far removed from that served up by Pulis in his time at Stoke. The exceptional, but ultimately limited success Pulis achieved during his two spells in the Potteries has been well documented and former skipper Danny Higginbotham describes the route one, possession poor, approach from the inside in this recent article for the Guardian.

Prior to this season, Stoke were outliers in virtually every statistical area as Pulis expertly identified areas of the game where he could eek out a tiny advantage that played to the strengths and more often the limitations of his squad. When Hughes was presiding over QPR, the league's most optimistic distance shooting side, Pulis' less frequent, but higher valued attempts, often executed from deep inside the six yard box, were providing continued Premiership football for the least admired member of that exclusive club.

Evolution, not revolution has been a constant promise, both this term and in Pulis' latter days at Stoke, and Hughes has resisted the temptation to make wholesale signings in the opening half of his first season in charge. It is likely that around eight of the starters for Stoke this afternoon will have been regulars under Pulis and this gives us an opportunity to take a look at the numbers recorded by those players firstly under Pulis and now under Hughes.

Marko Arnautovic, a rare addition to Stoke's 2013/14 squad.
Whilst shots and goals from distance, particularly from the boot of on loan Oussama Assaidi have been a feature of 2013/14, it is in the area of passing, both in terms of completion and accuracy, that the biggest change has occurred in the re vamped Stoke side.

Under Pulis, individual players were likely to see their rate of passing attempts fluctuate across seasons, simply through natural variation. For example, captain Ryan Shawcross made more frequent pass attempts in 2012/13 than he had done in 2011/12, whereas his central defensive partner, Robert Huth was broadly consistent across both campaigns. Overall, the common players in each season made a pass every 3 minutes and 9 seconds in 2011/12 compared to one every 3 minutes and 20 seconds a year later. A difference, certainly, but not a significant one. The chances that passes made in those two seasons where drawn from the same tactical pot reaches almost 40%

Passing Frequency and Accuracy Under Pulis and Hughes.

Player. Minutes/Pass. Under Hughes 2013/14 Minutes/Pass.
Under Pulis 2012/13.
Mins/Accurate Pass. Under Hughes. Mins/Accurate Pass.
Under Pulis.
Charlie Adam. 2.0 2.3 2.6 3.2
Peter Crouch. 2.6 2.4 4.1 4.1
Jonathan Walters. 3.4 3.4 4.9 4.7
Ryan Shawcross. 2.6 3.7 3.5 5.7
Geoff Cameron. 2.5 2.9 3.4 4.5
Steven N'Zonzi. 1.6 2.0 1.8 2.4
Marc Wilson. 2.0 3.2 2.6 5.2
Andy Wilkinson. 3.7 3.8 5.1 5.7
Glenn Whelan. 1.7 2.9 1.9 2.5
Robert Huth. 3.0 4.6 3.7 7.1
Matty Etherington. 3.3 4.0 4.0 4.9

If the difference in passing frequency seen in Pulis' final two campaigns is insufficiently extreme to indicate a major shift of emphasis, the same can not be said when comparing Tony's final year with Mark's first. The increased passing frequency currently displayed by Stoke is very unlikely to be a random draw from a typical Pulis season and the same is true of passing accuracy.

In short, under Hughes, Stoke are almost certainly passing more frequently and finding their intended target more often. Things have changed. Glen Whelan was making a pass every 3 minutes of playing time under Pulis and under Hughes that frequency is well below 2 minutes per pass and the majority of his team mates are showing a similar directional trend.

Of course, Glenn Whelan hasn't suddenly become a much better passer in his declining years, Stoke are simply playing a more conventional passing game, the average distance of their passes is also significantly shorter this term. Years of propping up the "definitive" tables for passing accuracy has been replaced by respectable midtable mediocrity for Stoke's previously unfairly vilified outfielders.

Tactics often make the statistics.

The first meeting between the sides, prior to Pulis' appointment was a close fought affair. Palace struck the important first goal, but Stoke's expected goals from their superior numerical attempts made them worthy winners, statistically and in reality. However, despite Stoke's apparently safe current position of 12th and Palace's 20th, the meeting is still very much a relegation contest.

The plan was that Stoke would turn into a more attractive passing side, maintaining their status by precariously staying out of the clutches of relegation, while Pulis animatedly prowled his technical area.That scenario becomes reality today....just not quite in the way everyone hoped it would.

Thursday 16 January 2014

Stripping the Luck from Shots from Outside the Box.

Following on from the last post on conversion rates from inside the box, with penalties and headers removed, here's the same analysis for shots from outside the box.

Any set of repeated trials where there is only two possible outcomes, a success or a failure, will over show a random variation over a series of team repetitions, even if every team has the same true talent and each trial is identical. Toss lots of fair coins, grouped as teams and some will appear talented and some will appear below average.

If there is variation in the levels of talent, the spread of the recorded conversion rates is going to be wider than you would expect from a group of results produced by equally talented sides. In the case of a side's ability to convert shots from the outside of the box (quite a large area compared to other competitive team sports), we can hope that the opportunities presented to each side even out with increasing sample size and any deviation from the expected results will be down to different, repeatable levels of team skill.

In short, we are trying to see the amount of variation in conversion rates that is down to team talent, once the ever present random component is removed.

We aren't saying that scoring from distance is entirely luck, because it clearly requires great skill. But we are trying to see, using the available data, if the difference in observed conversion rates for EPL sides is down entirely to luck. In other words, all teams are highly skilled and equally talented and the observed differences are just down to random variation. Alternatively, the spread of the conversion rates across the league, might imply that there is also some degree of true talent differential. In other words, we are looking at the best of the best, but some are slightly better than others.

In the previous post on shots from inside the box, the "extra" deviation that implied a repeatable talent was possibly present, diminished once we removed headers from the sample, because the proportion of headers that make up all attempts within the area greatly varied for sides in the 2011/12 season. Shots from outside the box are almost exclusively from kicks, therefore, this "cleaning up process" by removing headers isn't possible. However, shots directly from free kicks are the closest, atypical group of attempts from outside the box that can be culled from the larger sample to improve the consistency of the trials under investigation.

Firstly, dead-ball shots carry a greater goal threat than open play efforts, (a dead ball shot from outside the box is roughly the equivalent of a shot from open play, but half a dozen yards closer to goal). This is partly because the kicker can compose himself before the effort, but also because a side's most talent striker of a ball can be used, rather than a wider selection of players in normal, open play. if you include direct shots from free kicks, there is potential to increasingly distort both the distribution of attempt quality nd the make up of the players attempting the shots.

So in this analysis, all open play shots (with the feet) from 2011/12 from outside the box, for each of the 20 EPL sides are used and the strike rates for all 20 teams is recorded. The distribution of this actual 20 team strike rate is then compared to the expected distribution that is likely if each side was identically talented at shooting and converting from distance, allowing for the actual number of attempts by each side in the 2011/12.

Unlike foot shots from inside the box from the previous post, the range of conversion rates does appear much greater than the expected range from a random, equally talented draw. In short, there appears to be a larger talent component than that seen in shots from inside the penalty area, in the EPL during the 211/12 season. (Or we have a widely different range of the quality of the opportunities trialed by each side).

Similarly, we don't know if the most efficient converter of shots from outside the box were an exceptionally talented side that got unlucky, a good side that got marginally lucky or a mediocre side that got incredibly lucky. But, over the range of all 20 sides, we can regress the actually recorded conversion rates based on the amount of random variation that appears to be present, in an effort to improve the validity of the numbers.

Removing the Random Variation from Conversion Rates from Outside the Box.  

EPL Side 2011/12. Actual Conversion Rate. % Regressed Conversion Rate. %
Manchester City. 7.3 6.1
Manchester United. 5.8 5.0
Wigan. 5.1 4.6
Aston Villa. 4.4 4.4
Tottenham. 4.3 4.2
Arsenal. 4.0 4.0
Sunderland. 3.6 3.8
Everton. 3.6 3.6
Swansea. 3.5 3.5
WBA. 3.5 3.5
Newcastle. 3.4 3.5
Bolton. 3.0 3.2
Blackburn. 2.7 3.0
Stoke. 1.6 2.5
Liverpool. 2.0 2.5
Norwich. 1.9 2.5
Wolves. 1.7 2.4
Fulham. 1.8 2.4
Chelsea. 1.7 2.2
QPR. 1.6 2.1

The analysis is far from idea. Sample size would ideally extend beyond one season and quality of opportunity must still remain a strong candidate for the extended range of conversion rates. However, quantity of opportunity is considered and, as with shots from inside the box, the top four sides are to be found in the top six of converting sides.

Tuesday 14 January 2014

The Premiership Logjams.

Over the last couple of posts, I've been looking at various ways to express the current league and points positions of the Premiership sides in such a way that the proximity and quantity of challengers is partly captured.

The 2013/14 table appears intent on tearing itself in two, with sides contending in historically high numbers to either win the title or escape relegation on the final weekend of the season. I've used standard scores, which tell you how close to the mean performance a team is as measured in the currency of the standard deviation of that particular performance measurement for the league as a whole.

Currently Arsenal have gained 2.29 points per game in a year where the average is a fairly typical 1.39 ppg with a standard deviation of 0.54 ppg. So they are 1.66 standard deviations away from the current league mean that represents major success for the majority or abject failure for the entitled few.

In 19 seasons of 38 match Premiership action Arsenal rank 15th in a table of standard scores of sides that led the table through 21 matches. So the current leaders are far from dominant in historical terms, 14 leaders were further from their league's points per game mean than Arsenal currently are in the 2013/14 iteration. So you would expect the challengers to be fairly close on their heels and that is the case.

At the foot of the table, Pulis' Palace have slipped to the bottom of the pile prior to a reunion with his former side, Stoke. However, the cast of sides genuinely attempting to claim mid table security is traditionally very crowded and this time around just 6 points separate Palace from Hull in tenth. A standard score of one standard deviation below the mean makes Palace the most impressive bottom placed team after 21 games in the history of the 38 game EPL.


In the table above I've ranked the current EPL teams in terms of their current standard scores compared to the historical dominance of the sides that occupied their league position after a similar number of matches during the preceding 18 seasons. I've flipped the y axis so that height denotes dominance in the rankings.

The sides occupying the safe haven of midtable are currently the least dominant crop for their position in EPL history. It really is a case of looking over their shoulders at the towering cluster of statistically very similar sides that occupy positions in the drop zone.

A similar scenario exists for Arsenal, the points gathering achievements of the teams directly below them, coupled with the Gunners' proximity to midtable should herald a competitive final four months at both ends of the table.

Adding Context to Goals and Chance Creation.

One of the major problems when adding context to the generally collected counting stats that are used to quantify player quality, even in the broadest terms, is the constantly changing match environment in which a team is playing. 

Goals arrive at steadily increasing rates as we approach the later minutes, as trailing teams become more adventurous, leading to chances at both ends of the pitch and fatigue begins to provide more space for the creative players to operate. The steady accumulation of cards, more often by the defensive players also contributes to more frequent scoring. Short term patterns may appear to occasionally indicate otherwise, but over more representative timescales, 45 percent of goals appear before the break and 55 percent after.

As a consequence, a player, such as a regular substitute, playing predominately in the later, more goal laden minutes of a match, can see his scoring rate artificially inflated. If he also gains a positive premium from the more extreme fluctuations that can occur in smaller samples, he may be substantially over rated. 

There is also the likelihood that a player introduced after an hour will have a substantial fitness advantage over the majority of players that have already toiled throughout the game. An investigation of the high energy sprints made by a substitute in the final half hour compared to his usual output when he plays for the whole 90 minutes may be illuminating.

Small sample size can be dealt with by regressing such counting stats towards a league average, but accounting for playing environment also requires an awareness of the kind of figures a player's team mates were recording after he joined them on the pitch. In this post on the scoring rate of Dzeko, both as a sub and a starter, I illustrated than then apparent large difference could probably be accounted for by a combination of small sample size and a richer scoring environment.

One way to visualize the scoring impact a player is to record the percentage of each minute he has played, so we can see if he was more often present on the pitch when scoring was more likely to be at a high or low. 

In his final season at Arsenal, Ces Fabregas played in around around half of Arsenal's available matches and in the EPL his substitute appearances broadly balanced out the occasions where he left the field early, either due to injury, replacement or by way of red card. As a consequence his profile for the proportion of each minute played is relatively equally spread across each of the 90+ individual minutes. He played around half of the available time, with very little bias towards any phase of the match.


By contrast, in the 2012/13 season at Barcelona, he was regularly replaced around the hour mark and therefore played a much reduced percentage of the later minutes of a game compared to the earlier minutes, where universally longterm scoring rates tend to be reduced.

If we repeat the plot for this particular season the dip in the later minutes is apparent and an appreciation of Fabregas' scoring record at Barcelona should be seen in the context of his regular absence later in games. He played more frequently overall, than he had done in his final year at Arsenal, but spent a greater proportion of that time watching the latter stages from the bench in Spain.


Principally, Fabregas is a creator rather than a scorer of goals and while the rate at which chances are created across match time is much less readily available, it is likely to be broadly linked to scoring rate and game state. If chance creation, in general also accelerates in the later stages of a match, then Fabregas' absence in those later stages may under rate his actual performance. 

To attempt to eliminate this potential bias, I've looked at the proportion of total team chances created by Fabregas while he was on the pitch, firstly at Arsenal in 2010/11 and then at Barcelona in 2012/13. 

The rate at which a team creates chances will be influenced by many factors, from tiring defences, to overall team ability and any particular need due to the current game state. The distorting influence of these factors may be partly eliminated by quoting a player's raw counting stats as a percentage of those recorded by the team as a whole while he was on the pitch.

Team and Season. Minutes per Chance Created by Fabregas. Minutes per Chance Created by Team while Fabregas was on the Pitch. Percentage of Total Chances Created by Fabregas while on the Pitch.
Arsenal, 2010/11. 26-50s 6-50s 25%
Barca, 2012/13. 48-20s 9-20s 19%

Fabregas' importance to Arsenal is readily seen even in an injury curtailed final season. When he was on the field, the Gunners created nearly a chance every seven minutes, with a quarter of them coming from the Spaniard. By contrast, Barcelona were tactically more measured, stretching to a chance created every 9 minutes. As a consequence, there was longer between Fabregas' bouts of creative talent, but he still accounted for nearly 20% of the chances created while he was present, in arguably a more competitively talented environment.

Saturday 11 January 2014

Avoiding The Drop.

The January transfer window affords a team a final chance to add to their squads to try to achieve their aims at either end of the table. Historical precedence based on current league position, points already accrued and the often ignored proximity and quantity of immediate rivals can give a decent baseline probability of a side avoiding the drop or securing a European or title winning finishing position.

In this guest post I follow up a recent look at the top of the Premiership, by seeing how the current crop of struggling sides might fare over the rest of the campaign.

Tuesday 7 January 2014

Goal Difference and League Points.

One of the most analysed seasons by a side in recent years is the 2011/12 campaign by Newcastle, where Alan Pardew's side narrowly missed out on Champions League football, despite a goal difference of just +5, but with an accumulated points total of 65. It was an improvement from 2010/11, when they had finished below mid table and with a negative goal difference.

Their 2011/12 season begged two questions. Firstly, had the quality of the squad improved sufficiently to merit the seven place jump in finishing position and secondly, could a side that had outscored their opponents per game by the slimmest of margins, expect to amass so many points?

Improved team quality can be answered in hindsight. Tim Krul established himself as the first choice keeper in 2011/12 and new arrivals, both attracted and continue to attract transfer interest from the likes of Chelsea and Arsenal. So the improved results was probably no false dawn, they were a better side than they had previously been. But of more interest from an analytical viewpoint was the use they put to six fewer goals they conceded compared to 2010/11.

The rate at which sides score and concede goals over a season is a clear and obvious indicator of how successful they will be and this can be demonstrated by the strong correlation between goal difference and end of season points accrued. Inevitably, some teams will be lie above the line of best fit and some below. Although a side may have a large influence on their scoring rates they are unlikely to be able to call on goals or clean sheets at will. Therefore, the partly random way in which goals in each team's matches arrive may randomly influence the rate at which a team wins or draws games.

During the 38 game Premiership era, nine teams have finished with a goal difference of five as Newcastle did in 2011/12 and the average points gained by all nine teams was 56. Very close to the line of best fit. However, the range of points totals experienced by these sides spanned 19 points, from Newcastle's high of 65 to a low of 46 by Spurs. So Newcastle's lucky season is apparently counterbalanced by Spurs' unlucky distribution in 2007/08.

Similarly, nine sides has also finished with a goal difference of minus five, again the average points gained is close to the line of best fit at 46 and the range between best and worst points totals is 14 again nearly a third of the average. So it isn't uncommon for teams with identical goal differences to record relatively large differences in final points at the extremes.

Newcastle have taken part in seventeen 38 games seasons in the Premiership and their points total has over and under perform against their actual goal difference almost equally. They have over performed in eight seasons and been below par in nine. So there is no reason to suppose that they have a tradition of squeezing out the most from their goal stats, although playing staff and managers will of course change.

We can further look at how Newcastle's distribution of wins and losses in 2011/12 differ from the typical range of winning margins we might expect from a side that has a finishing goal difference of 5. Or more particularly, to account for goal environment, scored at an average rate of 1.47 goals per game and conceded at 1.34 as Newcastle did.


From the plot, Newcastle managed almost twice the number of two goal victories compared to the average expectation for a team that posts a goal difference of 5 over the season. To maintain that goal difference, they must be involved in more high scoring defeats than usual and the goal trading is illustrated in the larger than expected number of heavy defeats/smaller than expected narrow defeats and lack of wins by four or more goals.

Ideally a side would wish to concede all of their seasons goals in a handful of heavy defeats, leaving the goals they did score to gather points at an efficient rate. The "extra" points Newcastle achieved by producing a watered down version of this ideal left them on the edge of qualifying for Europe's premier league and knockout competition.

However, the less heady historical experience of teams which outscored their opponents by a similar margin, also suggested that random variation was more prevalent than intent. Squad turnover and the power law that dictates the length of managerial tenure in the Premiership gives us limited samples to test if a Pardew led Newcastle can over perform the usual relationship between points and goal difference.

Manchester United's 2012/13 title winning season was a similarly impressive outlier. A goal difference of +43 on average amasses just under 80 points, whereas Sir Alex signed of with 89 and a similar plot for their frequency of expected margin of victory or defeat graphically highlights where United claimed their "extra" batch of points.



Fewer draws, fewer large margin wins and losses, saw the surplus goals tumble down into creating almost twice as many single goal margin of victories than a side with such a goal difference would on average accumulate.

Unlike Newcastle, we do have a long history of stable management at United, although player churn does remain, so this this caveat in mind, we can see if there is a history of continual over achievement against goal difference at Old Trafford. A sign that there may possibly be causative actions working alongside random variation.

In twelve of the 18 twenty team Premiership seasons, United have gained more points than you would expected from the linear relationship that appears to exist between goal difference and league points, including nine of the last eleven seasons. This makes United the most frequent over performing team of the last 18 seasons, lying around two standard deviations away from the average percentage of over performing seasons of all teams with at least 10 years worth of matches.

Newcastle were widely predicted to regress after their exceptional 2001/12 season, and under the added burden of Europa League football, they duly obliged, in terms of points gained, at least. When measured against goal difference, their 41 points from a record of 45 goals scored and 68 conceded was four more than an average side could expect. So far this year, they are three points ahead of their goal difference expectation.

Newcastle may have failed to consistently perform at the level they appeared to show in 2011/12, but they have still have enjoyed the luck of the draw in terms of where and when goals have arrived in their matches.

In similar vein, United's luck has continually been predicted to turn, although their over-performance stretches back into the '90's and it may have taken the not inconsiderable change of manager and predictable injury concerns to key players to depress their overall performance. Although, as with Newcastle, their points total so far is just ahead of par for a team of their goal difference.

In many sports, score differential mirrors league position based on the results of individual matches very closely. Over achievement, such as Newcastle and United have experienced over varying timescales may well be largely down to expected variation seen in large enough sample size of numerous teams. But in a contest where a side's performance can be a tactical combination of what they are capable of doing and what circumstance dictate they need to do in the latter stages of individual matches, to perhaps turn one point into three, there may be a small amount of wriggle room to hand some credit to the coach.