Monday 30 September 2013

Are Arsenal's Goal Scorers Good Enough For A Title Tilt?

It is nearly a decade since Arsenal managed to wrest  the Premiership title away from the likes of the two Manchester clubs or their near rivals, Chelsea. So despite a inauspicious start, culminating in a home defeat to Aston Villa that led to knee-jerk calls for Wenger to go, they now find themselves handily placed atop the six game table. A run of five consecutive wins has sent Arsenal four points clear of Chelsea, five clear of Manchester City and eight points ahead of the most recent Manchester side to lift the title.

Goal scoring has been at the forefront of Arsenal's rise back to the top and although it is obviously early days, they are currently on pace to score 80+ goals. That's also the average seasonal goal scoring total achieved by the champions over the the last decade. Sides finishing in fourth average 67 goals a season, rising to 70 for third 76 for second. Simplistically, a side finishing fourth, Arsenal's most common finishing spot since their last title in 2003/04, would ideally be looking to improve their goal scoring by around 23% to achieve a more typical value of Champions.

Defensively, Arsenal has allowed just over a goal a game, on pace for 38+ goals over the season. Typically, fourth placed sides allow around a goal a game over the season, compared to 26 for the Champions. That represents a defensive improvement of nearly 30% between the average defensive performance of the fourth placed side to attain the average record of the Champions.

So increasing goals scored may be the easier half of the deal for a top four side to aspire to rise to the top of the pile.

These crude, early season projections hint at an attack that may have reached a level worthy of a championship tilt, but 13 scoring events hardly inspires confidence in any prediction. However, shot totals do increase the body of evidence. This campaign, Arsenal has made 87 attempts on goal, 15 of which were headers and 72 shots, in scoring their 13 goals. 27 efforts have been blocked and 25 have been off target. So the raw numbers are impressive, if likely unsustainable.

Shot location models can be used to estimate an overall goal expectation for an average team presented with all of Arsenal's attempts so far this season and Arsenal's actual goal tally can then be used to see how effective the Gunners have been in their first six games of 2013/14. A record of 13 goals, when an average side would likely only score 6.5 goals is undoubtedly impressive, as are 35 shots on target against an average expectation of around 26.

However, such favourable comparisons are hardly unexpected for a regular top four side, such as Arsenal over the last decade and the yardstick of a fictitious average side also provides an opaque standard. Therefore, to create a more relevant conclusion from the shooting data, I took all of Arsenal's shots from a recent season, 2010/11, ran a regression using those actual outcomes to see how likely that earlier side were to score from any shooting location on the field and then inputted the shot co-ordinates so far for Arsenal from 2013/14.

By creating a baseline model based around Arsenal (2010/11), a side that scored 72 goals in finishing fourth, we can see how likely it is that a side of that quality might score the 13 goals and hit the target 35 times from the 87 opportunities created by the Arsenal 2013/14 vintage.

                     Likely Goal and Accuracy Outcomes for Arsenal's 87 Attempts in 2013/14.

The 2010/11 team of van Persie, Arshavin, Nasri, Chamakh, Fabregas and Walcott, had they been presented with the 87 2013/14 chances, would most likely score between 7 and 8 goals. The distribution from simulating thousands of 2013/14 seasons, using shot co-ordinates from the current campaign, but the conversion and accuracy expectation of the 2010/11 side, indicates that such a combination would score the present Arsenal's total of 13 goals around 2.8% of the time. Just over 5% of the time, 13 or more goals would be the result. Also, Arsenal's 2010/11 side would possibly hit the target on the 35 occasions achieved already this season, 2% of the time. Equaling or exceeding this total in 5% of the trials.

Once again, this time when matched against an earlier incarnation of themselves, rather than an average baseline, the present Arsenal team appear to be at least worthy of their current position.

These simulations, which (imperfectly) compare the attacking component of the Arsenal side that finished 4th three seasons ago with 72 goals and 68 points, to the current achievements of Ozil, Giroud, Ramsey and Podolski, can be interpreted variously.

For instance, the pessimist may point out that there appears to be a 5% chance that the 4th placed also-rans from 2010/11 could have produced an as good, if not better record than the one posted in the six matches during August and September by Wenger's current team. So the first six games may just be a lucky, short term streak from a side with similar offensive capabilities to the one that fell short two completed seasons ago.

Or alternatively, the optimistic Arsenal supporter may consider the present record of 13 goals so far to the right of the likely range of outcomes that could have occurred if the opportunities had fallen to the 2010/11 side, that it is reasonable to conclude that Wenger's has a more potent strike force at his disposal than he had when van Persie was the focus.

Shot models can create numerous scenarios, but the subsequent interpretation can be much more subjective.

Sunday 29 September 2013

Innovative or Just Very Lucky?

The role of random variation, sometimes labelled as luck, is increasingly being recognised in the interpretation of footballing stats. The record of a player's on-field actions over a single season will always be a combination of his true capabilities and a measure of randomness, that sometimes inflates his figures and at other times reduces them. The very good will often be not quite as good as one outstanding season among many merely good ones appears to indicate and a hugely disappointing year from a journeyman may prove to be partly down to bad luck and won't be precisely repeated if he is given the benefit of the doubt.

However, the temptation to always presume that any deviation from the expected level of performance is always down to solely random variation is to assume the existence a uniformity of approach and application of the talent available to a coach, that may not prevail throughout the league.

To celebrate London hosting another regular season NFL game this weekend, I'll draw an example from gridiron. Defended passes are more frequent occurrences than the more valuable full bloodied interception, although both disciplines require a similar skill set and are therefore, reasonably closely correlated. It is analogous to a situation in football, where the much more numerous final third touches in one season appear to better predict goals scored in a subsequent season.

Over the last ten completed years, an NFL side could have seen between 60 to 130 defended passes in a year and between 6 to 30 actual interception made by their defense. You can use the relationship between passes defended and interceptions to produce an expected number of interceptions in a single season. This derived interception total can then be used instead of a side's actual interception total in year N to predict interceptions made in year N+1. In around 65% of the cases the expected figure is a better predictor of future picks.

At the start of the 2012 season, passed defended from 2011 suggested that New England's defense would grab 17 regular season picks. They actually caught 20, so they exceeded the prediction and it is tempting to say they were slightly better than average (the average number of interceptions over the last decade is 16 per season) and lucky. In the previous season, the same thing happened, they beat the prediction from a model that, overall improves the reliability of simply using previous year totals across all 32 NFL teams. And the next....and the next....

How NWE Out Performed A Predictive Model for Defensive Interceptions.

Year Actual Interceptions. Predicted Interceptions Lucky?
2012 20 17 Yes
2011 23 13 Yes
2010 25 17 Yes
2009 18 15 Yes
2008 14 13 Yes
2007 19 15 Yes
2006 22 15 Yes
2005 10 14 No
2004 20 15 Yes
2003 29 26 Yes

Over the ten year period, NWE outperformed the model (essentially a predictive regression) in nine years. The most likely number of seasons a side would expect to out perform the prediction is, unsurprisingly, five.
For a side to out perform nine times out of ten, if the regression models reality could happen by chance around once every 500 team seasons. We have looked at ten years for 32 teams, so to have found one team who went 9-1 against the model is certainly unusual.

However things get worse for the model, because Chicago beat predictions in eight of the ten seasons (about a 2% chance of happening by chance alone). So now we have two sides that recorded more interceptions than predicted by this model. 26 of the 32 teams over the ten seasons have over (or under) performing years that are within 2 season of the five season average. So the model works well for them.

But for NWE, Chicago, as well as Green Bay, Atlanta, Tennessee and (appropriately) Tampa it under estimates, consistently their intercepting prowess or alternatively, we have to assume these six teams were good and lucky (very, very lucky as a group). As ever, everyone is able to set their own level of confidence in each an every possible scenario or explanation.

An alternative solution is that although improving on a naive use of previous interception totals, this model still omits all possible causes for a side's intercepting abilities. Defensive scheme is one glaring omission (Tampa 2 type zone defenses invite interceptions to be thrown, whereas bounty hunting blitzes, recently favoured by New Orleans prize the opponent above the ball). It also omits the coaching input from defensive gurus, such as Belichick at NWE (who wasn't afraid to use wide receivers on the defensive side of the ball and wasn't above secretly taping their opponents (aka cheating), although this usually referred to the offensive side of the Pats game).

In short, models don't always capture everything and the missing bits may be what sets some teams, coaches and players above the rest or at least fails to identify those that may demonstrate a different tactical approach. A sides relationship to a measurement that can well describe the majority of the league may be branding them lucky, rather than recognising the flaws of a deficient model. Invoking luck to cover the unexpectedly different performance with undue haste should be resisted at least until we see if the "luck" is sustainable and therefore may have a causative agent.


Saturday 28 September 2013

Liverpool's Shooting So Far.

Following on from Friday's look at the likely outcomes of the goal attempts made by Robin van Persie had they been made by a league average player, here is a similar plot for all 63 goal attempts made by Liverpool.

Van Persie, has so far out performed the model by scoring slightly more goals than you would expect from an average player. However, such is the limited sample size for the Dutchman, it isn't possible to use data from the four matches he has played so far to demonstrate without doubt that he is an above average striker.

An inferior striker (if we assume from extensive evidence over the longer period of van Persie's career that he is indeed above average) could quite easily have scored the three goal total already achieved by van Persie in 2013/14. So in the absence of a larger cv, an elevated strike rate compared to a generic shooting model shouldn't be regarded as evidence of better than average talent. Data evaluations of playing talent will always come with levels of uncertainty, rather than cast-iron conclusions.

Shot data accumulates more rapidly for teams than for individual players and following Liverpool's 1-0 home defeat to Southampton, the Merseysiders had executed 63 attempts in scoring 5 times. The overall, model predicted, goal expectancy from all 63 shots is total just under five goals. Therefore, it is no surprise to see that the most likely individual goals tally recorded by an average side if they had been given those 63 opportunities is also five.

There is a 20% chance of an average team scoring five goals and a shade of odds on that a side would score at least five goals given Liverpool's opportunities. So we can tentatively say that over the first 63 chances created by Liverpool, there has been little surprise in their conversion rate. Everything that has happened once the chances presented themselves could reasonably have been duplicated by an averagely competent converting side.

Liverpool's shooting accuracy, however, is more extreme. The model predicts an average of just under 20 of the 63 goal attempts to have been on target and Liverpool so far have hit well in excess of that prediction with 27, making them the joint second most accurate shooting side in the EPL in 2013/14.

An over-performance of 40% in hitting the target 27 times compared to an expected 19.6 does appear outstanding and can easily give the impression that we are looking at a real and possibly sustainable effect. The temptation is to look to causes and explanations. But first we perhaps should see how unusual such a rate is for our baseline, average side to have achieved in 63 trials.

In simulations assuming an average level of all round competence, 27 shots are seen to hit the target around 1.5% of the time and at least 27 shots were recorded about 5% of the time. Certainly unusual, but not within the bounds that could be considered significant. On the evidence of 63 shots, Liverpool may be more accurate than an average side, but by quoting the 40% improvement over average, (especially if sample size is omitted), an inflated expectation of their true ability is almost certainly being created.

So, in statistical terms, there is a justifiable reason to suppose that, despite an impressive accuracy rate, Liverpool may be little better than average in reality.

Previous seasons and repetition of this inflated accuracy by broadly similar Liverpool teams of the recent past, is one route to adding weight to any opinion regarding Liverpool's shooting accuracy. But intimate knowledge of the shooting model that has been used is another. The model I've used includes many of the readily collectible variables, such as shot location, shot type, but it doesn't include such things as shot power, which are both subjective and virtually impossible to collect in any great numbers.

In the limited data I have, the power of the shot impacts negatively upon the accuracy, and yet increased power doesn't appear to statistically significantly improve conversion rate compared to normally struck efforts. (Placement is the obvious missing link). Therefore, (with the caveat that this is very limited data) you can construct a scenario, where reducing the power of a shot, doesn't reduce the conversion rate, but increases the accuracy and a shot extra saved is an extra possibility of a further shot attempted from an additional rebound. The exact profile seen at Liverpool this year.

Models can tell us much about how teams perform, as long as we aren't too dogmatic about conclusions. Ultimately, they just provide information on how likely it is that a real, data based assessment is going to coincide with an unobtainable, all encompassing knowledge of a side's true ability. Random variation can turn world beaters, short term, into average, run of the mill sides and vice versa. But equally (as in the proposed effect of shot strength), seemingly unusual results can be an early indication of a model depleted of minor, yet important variables.

Friday 27 September 2013

Over Performing Strikers or a Lucky Streak?

Shooting models that aim to predict the number of goals or attempts that hit the target, that an average player might expect to achieve, once such variables as shot location, defensive pressure, shooting method and power of the attempt, are accounted for, are steadily becoming the mainstay of player analysis, particularly in the case of strikers and attacking midfielders.

The accumulation of data, much of it self collected, is increasingly making it possible to arrive at a baseline expectation for any shot or header that is attempted from a wide variety of locations on the field, making comparisons with individual teams or players possible. A player or team which out-performs the generic average model may be considered to be better than average, although there is the possibility that the model may lack the variables to fully describe the goal attempt process.

The process of shot modelling is therefore a continually evolving one and even with the earlier caveats, a player whom is out performing the model is probably one to take note of. However, it is very easy to draw misleading conclusions when interpreting small sample sized trials, especially when the results are condensed into a summarized format.

It is still debatable as to the how the credit for an elevated conversion rate should be divided. In the case of van Persie, he may score at a higher rate than is normal, once shot location is allowed for, but this may be due to his skilled finishing, the quality and ease of the chances that Manchester United create for him (and this may not be reflected in a general shot model) or more likely a combination of the two.

This season van Persie has executed 19 attempts on goal, scoring three times (once from the spot) and hitting the target five times. My generic shot model gives him an expectation of 2.4 goals and  seven shots on target, once x,y shot locations are factored in. So, as a statement of "fact" (assuming the veracity of my particular model), van Persie has scored more goals than expected from a shooting model and has also been relatively inaccurate.

However, what level of confidence does out-performing a shot model in terms of actual goals scored compared to an expected average, allow us in describing van Persie's likely true ability?

Summarizing data based conclusions, often leads to the omission of extremely pertinent pieces of information. If van Persie scores more goals than a shot model predicts should be scored by an average player from the location of his chances, we really need to know how likely it was than his over performance was to occur by chance. And therefore we need to look at the range of scoring totals an average player might achieve given van Persie's opportunities and then see where van Persie's total compares to a "lucky" average marksman.

Above is the results of simulating van Persie's 19 shots so far in the Premiership, using the likely outcome of each shot as if it were being made by an average player. We've already seen that van Persie's three goals are above the 2.4 expected on average and his 5 on target efforts are below the expected average of almost exactly seven. Rather than summarizing the analysis and looking at general over or under performance, we are now re visiting each attempt.

An average marksman would be most likely to score 2 goals from these chances and that would happen around 30% of the time, but the next most likely outcome is the three goals actually scored by van Persie. That happens bout 25% of the time. So if this is all the information we have on the Dutchman, even though he has exceeded the goal expectation for a shooting model, there is still a 25% likelihood that an average player would be able to replicate his feats so far.

Similarly, with van Persie's shooting accuracy. Yes he is below the average expectation, hitting the goal just 5 times instead of 7, but random variation could inflict this level of inaccuracy on an average player around 13% of the time and such a player could record an accuracy that produced 5 or fewer shots on target 22% of the time.

So, tempting as it may be to describe players as above or below average over a limited run of games, their numbers are going to be at the whim of random variation to a large degree (we really need to always include shot numbers to at least give some context) and we need corroborative evidence, such as a sustained pattern of over or under performing, before we can begin to draw more definitive conclusions about a player's likely, true abilities.

In the next post, I'll take a look at teams.

Wednesday 18 September 2013

Mike Dean and Arsenal

Mike Dean takes charge of Saturday's match between Stoke and Arsenal at the Emirates. Dean is no stranger to officiating Arsenal matches and his appearance should give rise to a sense of foreboding for every fan of the Gunners, if a recently circulated statistic on Twitter ( and Rt'd by  link) has any credence. In the last 16 Arsenal matches officiated by Dean, Arsenal has won just twice. (The stat actually claims one win in 15, but omits the late season win over Wigan that finally, after years of trying, saw Wigan relegated). Like referees, we all make mistakes.

At first glance the figures appear compelling, over the same timescale, the rest of the top teams from Spurs upwards have each won at lest half of their Dean controlled fixtures. But as is often the case with good Stoke City omens, the case for an assured victory, possibly courtesy of any anticipated generosity of spirit from Mr Dean, is an incredible weak and contrived one.

The initial use of win percentage instead of the more convincing success rate, which includes and weights draws as being worth half a win, artificially extends the apparent gap between Arsenal's fate under Dean and that of their usual rivals. 7% of wins (actually 13% when the Wigan game is included) appears a very long way away from the 50% win figure of Spurs, the next most unfortunate top club to apparently suffer at the hands of Dean.

Mike Dean and Arsenal. 2009-2013.

Sides Reffed by Mr Dean. Wins. Draws. Losses. Win% Success Rate.
Arsenal. 2 5 9 13% 28%
Man Utd. 8 2 2 67% 75%
Man City. 10 4 2 63% 75%
Chelsea. 10 4 2 63% 75%
Spurs. 8 4 4 50% 63%

If we include draws, which Arsenal accumulated at the highest rate of the five teams, the resulting success rate slightly narrows the performance gap.

Secondly, the 16 Arsenal matches are a highly correlated set of fixtures. 11 of the matches are against the other top four sides listed in the table. United finished above Arsenal in all four seasons listed, while United's near neighbours and Chelsea also gained more points than the Gunners in three of the four years. Only Spurs proved to be consistently inferior to Wenger's side. Therefore, if Arsenal's rivals perform well in this table, Arsenal, as the team the other sides frequently meet, are virtually guaranteed to do badly. In short, under the selective rules of these engagements, the Gunners are playing tough, superior sides and only wins count.

However, the biggest give away in this attempt to suggest a correlation between Mike Dean and Arsenal's poor results is the selective cut off point made at the start of the 2009 season. If we go back further and include two more games, Arsenal avoid defeat against both Spurs and Manchester United. Include a further game, away at Chelsea in November 2008 and Arsenal record a win, despite Dean's presence in middle. Add one more and Blackburn are comfortably defeated, then Manchester United are held goalless.

If you include all of Dean's missing assignments involving Arsenal, 21 extra Premiership matches show up and Arsenal, apparently unaware that they are facing their future arch nemesis, lose just once and Dean appears to be more of a lucky mascot.

To finally nail this absurd fallacy that referees are able and motivated to determine the outcome of matches involving certain sides, here's a plot of the distribution of the likely success rates achieved by the Arsenal/Dean combination over Dean's Premiership career, once the quality of opponent is allowed for.

Arsenal would have most likely expected to return a 62% success rate over those matches and they actually returned a below average sr of 57%. The chances of the pre game match up ratings being broadly correct and Arsenal recording a record of 57% or worse is a shade under 20%. So there is nothing remotely significant to see here.

Arsenal's below par return is down to random variation rather than a Machiavellian plot. But even if a strong, unlikely to appear by chance, correlation between Dean and Arsenal under-performance is proven, causation then has to be established as well. And other than anecdotal opinion, usually based around red cards, fishing trips of this type rarely attempt to substantiate the headline with such evidence. Can Dean really be held responsible for Koscielny's decision to rugby tackle Dzeko five yards from goal, rather than choosing to defend the ball in an ultimate, 10 man Arsenal defeat to Manchester City?

Even if we allow the selective start points to stand by assuming that Arsenal did something to really annoy Dean at the close of the 2008/09 season, the conspiracy still fails. Arsenal's 28%, opponent adjusted success rate has around a 1% chance of occurring through short term variation, but Arsenal weren't the only side playing under the gaze of a familiar ref in those four seasons.

The Premiership referring pool is relatively small, most of them can squeeze around a small breakfast table at the FA's St George's complex. So lots of sides regularly see different officials over a fairly regular timeframe. Sooner or later, even with odds of 1% for particular combinations of ref/club, one such pairing is going to throw up a seemingly unlikely run of good or bad results, just by chance.

The good news for Arsenal is that Mike Dean hasn't held a grudge against them since 2009. The bad news for Stoke is that if they win on Saturday, it will come unassisted and on the back of their new found passing game, rather than as a gift from the officials. And I'm sure Mr Dean, should the wider press pick up on this absurd "factoid", is professional enough not to attempt to redress an imaginary imbalance in his dealings with Wenger's side.

Tuesday 17 September 2013

Where Have All The Strikers Gone.

Having written around 318 posts loosely based around the effects of random variation on sports data in general and football stats in particular, yesterday's post, which tentatively suggested that the lack of early season goals may have various causes in addition to the ubiquitous fluctuations of real time events, attracted the second most daily views in the life of this blog.

C'est la vie.

So as a rapid follow up, I will go through both the thought processes and methodology that went into posting yesterday's post.

Around November 2011, it suddenly dawned on everyone that the Premiership was seeing lots of goals. The big clubs were handing out the type of beatings to their own kind that they usually reserved for lesser teams. Manchester United put 8 past Arsenal, were then tonked 6-1 by City, who had already put 5 past Spurs, before Arsenal replied to Chesea's three with five of their own. And so it went on.

By the time 100 matches had been played, the average number of goals per game was very nearly three. To satisfy myself that we were seeing mostly random variation I simulated the 2011/12 season's first 100 matches to see how often you should expect to see 295 goals if the goal expectancy for each individual match was similar to recent historical levels for the 20 EPL sides.

Unfortunately, Liverpool's Ian Graham beat me to it and posted this when he was still at Decision Tech. Ian's a nice guy, so I'm sure he won't mind me using his post as an example of what I also did back in 2011 and latterly in 2013.

Having been beaten to the post in 2011, I repeated the process using the first 40 matches of the 2013/14 season, fully expecting the results to show that 74 (now 78 goals) in 39 (now 40) matches was likely to arrive about 20% of the time in a simulation using normal, for the current time, goal expectancy. There was a similar 17% chance that a "normal" group of EPL sides had produced 295 goals up to November 2011 purely through random variation and I fully expected to finally get to post my 2011 effort, two years late.

Instead I got a figure of 0.5%.

You are of course free to set the bar of "proof" as high or as low as you wish, but the 17% chance of 295 goals appearing in 100 EPL matches if we assume scoring is around historical levels, strongly implies short term fluctuation is the most likely, major cause. There is no need to speculate on why the art of defending is in decline or striking is in the ascendancy. The 0.5% figure for the first 40 matches of 2013/14 to produce 78 goals due to short term fluctuations of a typically higher recent goal expectation, however carries a much larger requirement for blind faith.

An alternative viewpoint may be that for various reasons, the current EPL over the course of the first 40 games has been played out by teams that may only be capable (or intend) to score less goals. If you simulate a 40 game run of matches where the average goal expectancy for each match is just below 2.5 goals, the chances of a total of 78 goals appearing jumps from 0.5% (using the historically higher goal expectancy) to around 10%. This figure is a much easier "sell"

The last four EPL seasons have averaged around 2.8 goals per game, prior to that and further back into the 90's, figures in the region of 2.5 over a whole season aren't uncommon. The French premier league regularly throws up total goal averages over a season that are even lower at around 2.3 goals per game. The opening group matches for the last five European Championships (conveniently totaling 40 games) has produced 2.25 goals per game compared to 2.78 goals per game for the final group matches. So lower average goal totals aren't unusual in top flight football and France.

So we may be seeing random variation (we always will), but around a lower expected goal scoring average. And that is where observation becomes as important as statistical analysis based around assumptions of what has passed for recent normality in the EPL.

It has been suggested that nothing short of a mass suicide of strikers could account for the expected goals total falling to 1.95 a game. But it need not fall to that level, it just has to fall compared to previous highs and then experience short term variation to arrive at a figure that sits to the left of the true average, but happens with enough frequency (say around 10%) to provide a believable explanation for real events.

Minutes Played out of a Possible Maximum of 3240 By the Major Scorers for Eight EPL Sides in 2013/14.

Player(s). Goals scored in 2012/13. Goals as a Percentage of Team Total. Minutes Played in 2013/14.
Bale/Defoe 32 51% 37
Crystal Palace.
Murray. 30 41% 0
Suarez. 23 34% 0
WBA (Loan).
Lukaku. 17 34% 40
Ba. 13 30% 65
Holt. 8 20% 0
Manchester City.
Tevez. 11 17% 0
West Ham.
Carroll. 7 16% 0

All EPL goal scoring strikers are currently alive and well, but many are in new (non) Premiership surroundings or idle.

Bale is of course in Madrid and Defoe, with whom he combined to score over half of Spurs' goals in 2012/13 is an occasional late substitute. Palace's 30 goal striker, Glen Murray is still recuperating from an injury sustained in the playoff semi final. Holt is now in the Championship, Suarez is completing an anger management course, Lukaku is largely practicing penalties on Chelsea's training ground prior to his loan to Everton and Ba is similarly underused, Carroll is injured and Tevez has departed.

So we can chose between random, short term variation around a historically high goal expectation leading to a very rare run of scorelines or (and not necessarily the only alternative), short term random variation around a slightly lower goal expectation caused by the temporary or permanent unavailability of 141 goals worth of talent, but likely to occur much more frequently.

The general opinion is that levels at the end of the year will return to more normal totals, but that won't validate the opinion that everything was "normal" in August and September. The conditions of the trials (games) will inevitably be different also. Suarez can return in a week, Lukaku will be unleashed on defences by Everton as he was by WBA in 2012/13, Carroll and Murray will return from injury and Ba, another talented scorer with little appreciation at Chelsea, may even get to wear in anger that Stoke shirt he wore for his (failed) 2011 medical.

I'm a huge advocate of accounting for short term, random variation, but it shouldn't be used exclusively to solve every conundrum. 

Monday 16 September 2013

A Premiership Goal Drought.

Stoke fans will take their team's slight elevation in the Match of the Day running order as a sign that Lineker and Co. are finally acknowledging that the Potters can, if needed play relatively attractive football. An alternative, and possibly more persuasive argument would be that the Premiership as a whole has produced so few goals in the first 39 matches, that even a relatively mediocre game can jump up the pecking order.

Seasonal scoring trends has been a feature of both the EPL and football league in the past. Goal scoring traditionally started slowly in August, quickly rose to a peak, before falling gradually over the winter months, before a May goals fest. However, in scoring just 74 goals in 39 matches so far, the EPL has set a low scoring precedent.

The average scoring rate in the EPL hovers around 2.5 goals a game, it had averaged 2.64 since 2002 prior to the start of this season. But bouts of lower scoring are also inevitable, especially over relatively small sequences of matches. If you split the EPL since 2002 into weekly blocks of 40 matches, around 1.5% of those 40 match runs produced an average of 2 or fewer goals per game. So on this basis the current run would appear to be certainly rare, but not entirely precluded.

Teams can, fairly predictably, produce low scoring matches over relatively short time frames. The opening group matches of a major tournament, such as the World Cup for example or hugely important one off matches such as FA cup finals, tend to be less goal laden than earlier rounds, in the case of cup ties and later games, in the case of group matches.

However, generally the gap in quality between each side is the major factor in determining how many goals will be scored in a match, especially over league scale time frames. The remaining week four match sees Liverpool travel to Swansea as narrow, 4 tenths of a goal favourites, so you would expect around 2.6 goals to be scored in this fixture, on average. A more of a miss match, along the lines of Palace's visit to Old Trafford would see the total goals line creep just past 3.

It is therefore possible to estimated, based on historical general scoring rates, as well a individual team tendencies, the average number of goals each of the previous 39 games would expect to see and from this information, the range and frequency can be simulated for total goals scored in those 39 games. The most convenient way is to model the sores via a Poisson distribution.

Although it is tempting to lump the first 39 or 40 matches of each season together as a single repeatable trial, there are small differences caused by the slightly different team strengths and match ups thrown up by the fixture list compilers, from one year to the next. The 2013/14 season has seen slightly more closely matched games, based on pre game estimates, than has been the case recently. Therefore, we should expect to see these more frequent, potentially closely fought matches produce less goals per game. However, this effect is unlikely to wholly account for the figures that we have seen.

The relative rarity of so few goals being scored in actual batches of 40 matches is repeated in the simulation for the 2013/14 season to date. The simulations using goal expectancy figures for all sides that would be consistent with figures seen over the recent EPL history, generated a season with 74 goals in the first 39 games once every 550 seasons. While a season that had 74 or fewer goals in the first 39 matches appeared once every 185 campaigns.

So, although not precluded from occurring under the kind of Premiership football we have witnessed over recent seasons, the first 39 matches in 2013/14 are extreme outliers.

We've touched briefly on matches where sides capable of producing games with "normal" goal totals can tactically adapt, possibly through fear of losing, to produce much lower scoring games, such as the opening group matches of a tournament. So perhaps other factors, such as these are present in the 2013/14 Premiership, other than the natural random variation within small sample sizes.

Five clubs do have new managers and one, Mourinho at Chelsea, consistently presided over low scoring matches in his previous stint at the club, under a policy that coveted defensive strength. And as in most sports, defence is more readily organised and excelled at than is creative attacking play. Numerous other managers are also relative newcomers or are now managing in an elevated arena compared to the Championship.

Secondly, almost half of last season's top twenty Premiership scorers have barely kicked a ball in anger, so far. Some, such as Bale and Tevez are no longer Premiership players, while prolific scorers, such as Ba, Lukaku, Suarez and Defoe are yet to start a game. Also, depending on source, shots are currently hitting the target with nearly 40% less regularity, compared to historical levels.

Overall, every team, with the exception of the surprisingly good, Arsenal and the predictably chaotic, Sunderland have been involved in matches that have seen their total match goals come in below the expectation predicted by an, until now, robust Poisson based modelling approach.

It is easy to weave narrative when presented with unusual outcomes. The goal glut at the start of 2011/12 didn't continue, although the chances of it appearing, with little change in scoring intent, was a much more substantial 18% compared to the present paltry 0.5% for our current drought. For once sample size and random variation doesn't appear to Bale out those looking for a mundane explanation, although it is certainly a component of the sought after solution.

Time to test the theories!

Sunday 8 September 2013

NFL Teams In Different Game States.

An unedited reprint of an old post from an old blog, looking at the more obvious change in approach shown by NFL sides when they lead or trail. The draw or tie doesn't really feature in the NFL, so Game States are more clearly defined simply by the current scoreline compared to football (soccer). 

NFL sides, as well as passing more when behind and running more when ahead, also tailor their strategy to their strengths. Teams that run well, run more frequently than the league average when they trail and a side which passes well, continues to pass more frequently than the league average, when they lead.

This mixture of overall league tendency and specific team tendency seen in the NFL is present, but less obvious in football (soccer).

This was the most popular post on my NFL blog, amassing nearly 20 views.

How Teams Try To Win

One of the more obvious ultimate aims of a NFL team is to score enough points to try to guarantee victory over it's opponents.However,it is equally apparent that at certain times during a game teams have other objectives that take preference over maximizing the score.Running the ball to run out the clock when they already have a large lead,for example.

What follows tries to identify the different stages in a game and tries to pinpoint the tactics used by teams when they are actively trying to score points.

There's a multitude of factors that determine a team's approach during a game,but I'll concentrate on ones I consider most influential.

Firstly,down and distance.These two factors can be reasonably broken down into predominately passing or running plays.To try to eliminate any in built play calling bias as a result of down and distance I decided to look exclusively at 1st and 10 plays.It's not an obvious running or passing down/distance and it also provides a hefty sample size for each team.Everyone gets a first and 10 sooner or later.

Next the current score.It's well documented that teams favour the run when well ahead and the pass when well behind.So I further broke the first and 10 plays down by the current score.I looked at the ratio of runs to passes when teams trailed by 2 or more scores,trailed by 1 score,where tied,led by one score and finally when they led by 2 or more scores.

And lastly I decided to include a teams offensive strength.Even poor offensive teams are likely to be better at running the ball compared to passing it or vice versa.I was simply interested in which offensive skill a team did better at and by how much compared to their weaker discipline.

I firstly compiled a run attempt/pass attempt ratio for all 32 teams from the 2007 season,to confirm that teams favour the run when well ahead and the pass when well behind.

And they do.

On average teams throw around two passes for every one run when they trail by 2 or more scores on 1st and 10.When down by 1 score the ratio has moved closer to parity,but on average 1.2 throws are still made for every one run.Running is favoured when teams are tied.1.2 runs for every one pass.That increases to 1.5 runs to 1 pass if teams lead by a score.Lead by 2 or more scores and runs start to outweigh passes by almost 3:1.

This progression from throwing when behind to running when in front is mirrored by all 32 teams.

However,this carn't be the whole story.There must be periods of the game where teams are trying to maximize the points they score and they must be trying to do this by a combination of maximizing their yards per play and increasing their chances of continuing drives.It further seems reasonable that they attempt to do this by playing to their offensive strengths.Playcalling when trailing or winning big,seems to be dictate more by the state of the game than a team's offensive strength.So the next step was to see if a team's offensive strength dictated how a team played when the game was close,say within a score either way.

              From This...............To This


Initially,I chose two teams with widely differing offensive styles.In 2008 Minnesota ran the ball extremely well and passed it relatively poorly,while the reverse was true for Indianapolis.

If offensive strength did play a part in play calling as well as the state of the scoreboard,then it seemed likely that as these two teams went from trailing to winning,you would see Minnesota committed earlier to the run (their relative offensive strength) ,while Indy would stay with the pass (their strength) for longer.

And that's what happens.

Minnesota are already running more than they pass when they still trail by 1 score (the league as a whole are still passing more than they run) and Indy are still passing almost as often as the run even when they lead by 1 score (the league as a whole become more frequent runners around when the scores are tied).

Having seen that two teams with polar opposite approaches to offense tend to go to their strengths in close games the last step is to see if there's a general league wide tendency for teams to rely on what they do best.To do this I calculated the strength of the correlation between what a team does best on offense and how often they attempt to do it split by current score.

When the 32 teams trail by 2 or more scores there is no correlation between the two conditions. There appears to be no evidence that teams that run better than they pass run more often in these situations(correlation of 0.01).The same applies to better passing than running teams (correlation of -0.04).It appears that the situation of being 2 scores or more adrift,strongly dictates play calling,everyone has to pass whether it's their most potent attacking force or not and it appears to be a haphazard process.

However,when down by just 1 score teams are able to start to go to their strengths.Teams that pass much better than they run,tend to pass more often than other teams in this situation.When teams trail by a score the correlation between passing well and passing often is 0.35.

The correlation is similar when scores are tied and peaks at 0.47 when teams lead by a score.(Presumably they recognise that one score isn't a decisive lead and they need to press home their advantage and the best way to achieve this is to do what they do best and do it more often than league average).

Once teams lead by 2 or more scores the correlation becomes entirely random again and play calling mirrors what happens when teams are trailing by 2 scores.Running becomes predominant and teams effectively forget where their strengths lie.Their game plan is no longer focused on increasing their score,it's more about shortening the game by keeping the clock running.

The situation for running the ball is identical.The better a team is at running the ball compared to passing it,the more they pound the ball when the scoreboard is within a score either way.Once the lead or deficit becomes larger,they apply the doctrine of pass if you're behind and run if you're ahead and the reasonably strong correlation disappears.

Thursday 5 September 2013

Predicting Interceptions In The NFL.

Continuing the theme of using more numerous match day occurrences to predict rarer, but significant events in a sporting contest, I've used the start of the NFL season to show how team interceptions made by the defence can be modeled with more confidence.

Possession is king in the NFL and the number of possessions enjoyed by each side is invariably equal. Therefore, it is essential that a team maximizes the use they make from each possession.

Every yard on a gridiron has a points expectation associated with it, depending upon relative team ability and occasionally time remaining. A loss of possession costs one side the potential points expectation they had prior to the turnover and often hands their opponents a healthy points expectation where they take over ownership of the pigskin.

So a turnover, specifically an interception, is a significant hurdle for a side to overcome. If you lose the turnover battle, you often lose the game as well. In this guest post I look at how you can best predict the number of interceptions a side will make in the upcoming NFL season....and it's not by looking at the number of interceptions they made last season!

Tuesday 3 September 2013

How Game States Alter Chance Conversion Rates.

Ideally, if you are attempting to quantify an identifiable skill in a sport such as football, you would like both the conditions of the trial and the context within the game to be controlled. Penalty kicks fulfill many of these conditions. A free kick from 12 yards, taken at relative leisure without the intervention of defenders, where only the identity of a similarly skilled goalkeeper alters, is as good as it gets in football. Unfortunately, it is also a rare event and therefore as a way to differentiate a repeatable talent, it ultimately fails.

Shots from open play are much more common events, both from an individual and team perspective. However, the advantages of consistency of each trial that was present in penalty kicks is largely lost. A two yard tap in or a thirty yard volley each appear as an indistinguishable "shot" when all attempts are simply lumped together. 
On a team basis, the two polar extremes for goal attempts from recent seasons are Stoke, at their set piece dependent best (or worst) and optimistic, long range shooting QPR. Both side's struggled for goals, but measured by raw shots alone, Stoke appear the more efficient of the pair. In 2010/11 their conversion rate of 11% hovered around the league average over the last decade, while in comparison, QPR in their relegation season recorded conversion rates of barely half that.
However, the comparison is misleading, City's average shooting distance was just past the penalty spot and QPR's was very nearly at the edge of the box and also a couple of yards wider. Rangers can be faulted for shooting so regularly from distance compared to both Stoke and the rest of the EPL, but it is that misguided optimism that led to an apparently abysmal conversion rate. When shot position is accounted for both QPR and Stoke were converting the chances they elected to take with similar levels of ability. 
If QPR had elected to try to create chances closer to goal, their conversion rate on a shot by shot basis would likely improve, with no real change in shooting ability. Similarly if Stoke shot more from distance, their rate would likely fall, again with no requirement for an underlying change in talent. Tactical approach, rather than changing talent or masses of randomness can be a huge factor in fluctuating shot conversion rates. 
The disconnect between raw counting conversion rates and x,y based rates is obvious in the case of Stoke and QPR, but similar effects are present for all sides.

If we start by looking at the rate at which teams from the EPL have converted shots, regardless of any additional information such as shooting distance, there is a relationship of sorts between conversion rates in season one and those recorded in the subsequent season. The line of best fit appears to indicate that poor conversion rates in one season tend to be followed by poor, if generally slightly improved rates in the next season. At the opposite extreme, a side converting at a well above average 18% would on average fall to around 14% next term. 
So there is evidence of a difference in finishing ability between sides, but also a degree of regression towards the mean, implying an expected amount of randomness, also.
The case of Stoke and QPR's different shooting profiles illustrates that shot position is a major factor in determining a fair expected conversion rate for a side. Shot position is mostly a choice determined by the attacking side, but in some cases a side is also partly forced into shooting from greater distance as time expires from a disadvantageous scoreline position. Such situations when, but not exclusively, a side trails is often accompanied by their opponents in addition presenting a more defensive shell. 
More speculative shots, against packed defenses, intuitively is going to depress conversion rates. So again we have a situation where any side can find itself in a situation where the trials commonly used to calculate the strength of season on season correlations between conversion rates are being altered by circumstances that are partly out of control of the attacking unit. In short, if your defense, through a combination of random chance or poor play puts a side consistently in poor game states, then your shooting conversion rate is likely to fall through poorer quality and better defended chances arising at the opposite end of the field.
We can see possible evidence for more frequent shooting going hand in hand with less efficient conversion rates by plotting Arsenal's total shot numbers and their seasonal conversion rate from 2002-2003 to the present. Random chance inevitably will play a part in the Gunners grabbing or conceding the opening goal, but how frequently they found themselves in either a good or bad game state will then alter the quality of the subsequent shooting trials. Around three quarters of the sides which have played for five or more seasons in the EPL since 2002-03, exhibit the same trait of decreased efficiency with increased shot frequency.
Arsenal, along with the other big four sides, tends to have the simplest game states. Leading is always good, but such is their quality, that drawing and obviously losing is invariably bad. Therefore, the average game state they experienced over a game or a whole season often corresponds closely to the amount of time they spent winning, drawing or losing. This allows us to express a good proxy for game state in a single number by using the proportion of time spent leading, as well as giving half the weight to time spent drawing over the period of a single game or a whole season. 
And the same pattern is seen. The poorer the average season long game state experienced by Arsenal, the more shots they had. Similarly, for Stoke, a side which have a more ambiguous relationship than Arsenal with a stalemate (sometimes against weaker sides it represents a poor game state, more often though, against better teams, it is a good one). In all matches where they had a better than average game state, they took just 8 shots per game, compared to an average of 12 when it was below average.
So we have a connection between more, less efficient shots being taken in poorer game states and while the former may partly drive the latter, the changing game state also alters, for better or worse the likely conditions of the shooting opportunities. Either in the longterm, depressing an already (partly luck driven) poor efficiency or enhancing an already impressive one.
In short, the context of game states is likely to have a significant effect on conversion rates and may even act as a decent proxy for shot distance and defensive pressure.
There are no short cuts to calculating game states. Final scorelines can mislead, a side can trail for 85 minutes and then grab two late goals, or score twice early and concede in second half injury time. Two 2-1 wins, but with vastly differing game states and in all likelihood, dissimilar goal attempt profiles. 
Time spent leading/drawing and losing are the building blocks, but then we have to decide how happy to defend or eager to attack each side will be in the commonly occurring stalemated scoreline. So we also require an estimate of team quality to further quantify game state in this all encompassing area of analysis.
To demonstrate how game state alters the conversion rates of a side from one season to the next, when squad turnover is likely to be light, above I've plotted paired conversion rates from consecutive seasons for Arsenal, again since 2002-13. For amalgamated data comprising 38 games in each point, the correlation is disappointingly poor. The temptation is to assign the lack of correlation entirely to random variation, and while that undoubtedly exists, we also have a naive model, lacking in detail. 
If a side has the good fortune to lead lots of games and if their style of play allows, they can sit deep, sit on their likely high conversion rate and attack their opponents on the counter, where chances may be fewer, but they will likely be of much better quality because their opponents are actively seeking to pull goals back. If during the next season, they fall behind more frequently (possibly because of a poorer defence and/or an unlucky attack), they could easily find themselves with a much reduced conversion rate, as they are forced to trial their shooting skills against more densely packed and better organised defences. 
The poor season on season correlation could be down to a combination of randomness, but also seasonal variation in game state.
If we wish to know how conversion rates correlate from one year to the next, looked at through the lens of total shots, we should at least try to accommodate important factors that appear to contribute, such as game state, both previously and in the season in question. So, instead of plotting paired conversion rates, I've taken the conversion rate in the previous season, along with the game state from that year and the game state experienced by the side in the subsequent year and projected a conversion rate for that subsequent campaign using these three factors. 
In short, if team A (appropriately Arsenal) convert at a certain rate under x average game state, what will they do under y average game state with mostly the same squad based on previous patterns. I've plotted this projection against the actual conversion rates above and the r^2 jumps to nearly 70%.
Game states and previous conversion rates go a long way to explaining, why a side records such apparently random conversion rates in consecutive years. Randomness exists, but other more concrete causes are equally as important.