Monday 25 November 2013

Pulisball. What to Expect at Palace.

Crystal Palace's long search for a new manager finally ended on Friday night, when one of the early front runners, Tony Pulis put pen to paper. Any lingering doubts about Pulis' at times ugly style of play was ultimately trumped by a managerial record that has never seen him relegated. Sacked yes, but not relegated.

Pulisball gained ridicule, contempt, grudging respect (and that was just from his own fans), but just enough points to guarantee perennial survival. With the unique ingredient provided by Delap's long throw it did prove an over achieving way of retaining Premiership status for a largely Championship quality team.

Long throws, long balls and set piece goals were the most visible face of a Pulis led Stoke, but it wasn't the only requirement if 40 points were to be achieved year on year. It also needed a defence. A side that relies on set pieces to provide a higher that average proportion of their goals are usually limited attacking sides. The correlation isn't overwhelming, but the trend exists for sides that score proportionally more goals from set plays, to also score relatively fewer goals over a season compared to sides that have a more balanced attacking approach.

Unsurprisingly, an inability to score goals also goes hand in hand with lower end of season points totals and the strength of the correlation is very strong. The r^2 values for success rate (a proxy for points) and goals scored since the start of the 2008 season is 0.77, so 77% of the variance in success rate comes about as a product of the variance in goals scored.

A slightly weaker correlation, but of a similar order of magnitude for r^2 also exists between goals allowed and seasonal success, which leads to the inevitable conclusion that goals scored and goals conceded by sides are themselves not independent. The negative co dependency between the number of goals a side scores and the number they concede is plotted below. The more goals a side scores in a season, the fewer they will tend to concede.

This correlation is probably driven, firstly by a general difference in relative ability between sides. Better sides are simply more adept at playing all aspects of football. Such sides score more and concede less, but there may also be a tactical aspect. It is difficult to score yourself when your opponents has the ball in your final third. In some cases, attack may also be an effective form of defence.

So the way a side either chooses or it forced to try to accumulate goals in the EPL can have an effect on the amount of goals you score. Stoke's reliance on set play goals was likely to result in few goals being scored. The poor goal totals would in turn lead to unimpressive seasonal success rates and that in turn often led to relegation at worst and bottom half finishes as the norm.

However, Stoke ended four of the five seasons under Pulis with higher seasonal success rates compared to the line of best fit for goals scored, denoted by the red points in the second plot. A clue to why this might have occurred can be seen in the plot above. In all years under Pulis, they allowed fewer goals than expected for a side that scored at the rate they did. Again Stoke's five seasons under Pulis are represented by the five red points in the plot above. The brutal simplicity of creating a set piece chance just by winning a throw inside the opponents half, often meant that Stoke didn't have to commit many players forward to do so. The tactical aspect of Stoke's chance creation left plenty of defensive resources intact to keep the score down in games that mattered.

We can't dismiss the possibility that Stoke's relative defensive excellence was just merely down to chance. If we look at enough sides, some are going to appear continually better than expected, but by pure chance. However, anyone who has watched Stoke, especially in the early EPL seasons can't have failed to notice the overtly defensive stance they took both at home and away from the Potteries.

Although they were correctly classified as a side that relied on set pieces for the majority of their goals, it would be a mistake to look at this aspect of their play in isolation. A team is a sum of their parts and the general outlook for set play sides is rather bleak. They are more likely to be relegated than a side that can score proportionally more often from open play. They finish in the bottom half of the table with greater frequency and they are highly unlikely to be capable of challenging for even a Europa spot.

Pulis' Stoke ticked most of the statistical boxes for a limited, set piece reliant side. Palace already score nearly half of their goals from that source, although raw numbers are small, so percentages will fluctuate. But the real wrinkle that Pulis introduced that propelled an approach with little upside or room for error to a relatively comfortable way of retaining your top flight existence, was his insistence on a defensive team effort.

Palace are already a side that find scoring difficult, so their limitations already provide half of the ingredients for Pulisball to make a Premiership return. To complete the formula, Pulis' secret sauce will be liberally applied to the defence in an effort to maintain his proud record.... but this time he'll have to do it already 12 games into a season and without a defence enhancing wildcard such as Rory Delap.

Wednesday 20 November 2013

World Cup Qualification. The Talented and the Lucky.

The extended prequel to World Cup 2014, namely the qualification process finally closed for UEFA following the second legs of  the playoff matches on Tuesday night. Spaces are limited, so it is inevitable that there will be some notable absentees when the final draw takes place in Rio on December 6th. Apparently, parts of the media are inconsolable because the 20th World Cup will take place without Ibrahimovic following  Portugal's elimination of Sweden.

Seeding for the qualification groups was decided by the FIFA rankings on July 2011 and the majority of the top seeds were recognizable as the current, leading national teams. France provided the biggest point of interest, by falling into pot 2, although they had been outside the top 9 position that guaranteed a more favourable draw for over a year. While France failed to recover ground lost at a dismal WC 2010, sides such as Norway reeled off a consistent string of narrow wins in both friendlies and higher weighted competitive Euro 2012 qualifying matches to retain their place in pot 1.

As Simon Gleave points out here, tournament formats and the vagaries of chance can play a huge role in deciding major sporting events. And by falling into pot 2, the chances of France drawing a group containing a previous World Cup winner or the Netherlands was more likely than not. By being paired with Spain in the five team Group I, France weren't quite in the group of death, but they were odds on to need the playoffs to progress. So more the group of maximum inconvenience and by taking the playoff route to the finals, they were unlikely be blessed with a comfortable passage if they did make the finals in Brazil.

Five of the nine top seeds qualified as group winners, three made it through to the playoffs and only Norway belied their ranking by slipping out of the competition at the earliest possible stage. France, as expected trailed in behind Spain and Russia (from pot 2), with a resurgent Switzerland and Belgium (pot 3) completed the list of nine group winners. So class, as measured by recent FIFA rankings appeared to shine through with reasonable clarity.

If luck, in the most purest sense, decides the make up of each qualifying group, where danger occasionally lurks in pot's 2 or 3, as a new generation of starlets sweeps countries to levels beyond their recent station, it is small sample variation that can derail teams once the groups are fixed. Of the two heavyweights in Group I, Spain would be confident of confirming their supremacy over a France team that is ranked over a dozen places below them, but the eight match format afforded the current holders just two shots at their most likely challengers for automatic qualification. A strong team, possibly out of place at the wrong time, can cause unwelcome early challenges, even for the top seeded side.

The outcome of each of the nine UEFA qualifying groups can be viewed as a random, weighed draw comprised of the outcomes of each match played in the group once the relative abilities of each side is accounted for. Reducing World Cup qualification to a sterile number crunch lacks all of the tension of a wet night in Warsaw, but it does help to add context to the perceived achievements and failings that we have seen over the last year and a half. A side can perform well above expectations, but that may be partly due to improvement and partly to the randomness at which results cluster in small sample sizes. Malta do have a shot at defeating Italy, but it is a very longshot.

France could do nothing to influence their chances of avoiding Spain or a similarly talented seed, once their own performance/luck combination over 2010 had anchored them outside the best nine European sides, but raw ability and 94th minute equalizers, vied with random chance to decide the individual outcomes of the subsequent matches. The destination of the group honours was a combination of talent and luck, where the actual placings that transpired were just one of many possible combinations that could have occurred from that heady mix.

Below I've simulated the outcomes of 1,000's of iterations for all nine groups, using the bookmakers odds as a proxy for team talent on the day and the constant repetition to host the role of randomness.

The progression of a young Belgium side, as measured by their gradual elevation in the eyes of the oddsmakers over the qualifying period, gave them around a 50% chance of topping the group in the simulations. Their actual points total of 26 was above their simulated median score of 20. The FIFA rankings took this over achievement at face value and propelled them up to third in UEFA from a starting placing of 22nd at the start of the qualifying process.


Denmark could consider themselves unlucky to miss out on a play off berth as the ninth best runner up. The tie breaker saw the lowest points scoring runner up eliminated once the record against the group's worst side was expunged to account for Group I only having 5 teams. However, the average points total achieved in all simulations of Group B by the runner up was 19.4 with a standard deviation of nearly 2 points and Denmark's actual tally of 16 was therefore, nearly 2 standard deviations below the average, making them comfortably the most under performing second placed team over all nine groups.


A comfortable qualification for Germany, where the group make up provided them with little danger from three inferior challengers who were likely to take points from each other. In gaining 28 and 20 points respectively, Germany and Sweden each gained a couple of points more than their median points totals across the simulations.


Romania's finishing position of 2nd appeared impressive because it got them into the playoffs. But it was only one spot above their most likely finishing spot of 3rd and they gained just two points more than their median in all simulations. Without an injury time equaliser in Budapest, Hungary could have swapped places with them at the death. Three teams were close together trailing runaway winners the Netherlands and Romania finished qualifying in a still relatively lowly 19th place in FIFA.

A case of Iceland performing well above their initial standing as a team drawn from the lowest pot or a rare occasion of a lowly rated side collecting a fortuitous sequence of unlikely results to propel them to unsustainably heady heights in the short term? They were rated by the oddsmakers throughout the campaign as the second worst side in the group, with a most likely finishing spot of 5th and around a 10% chance of snatching second place in simulations.

A very tight group on the field, with 18 of the 30 matches being either drawn or won by a single goal. Both winners Switzerland and 2nd placed Iceland outperformed their odds based median points total by 5 points, indicating a miss calculation by the bookmakers or fortuitous, short term set of results that saw, in the case of Iceland, a less fancied 10/1 shot beat two more deserving talents in Slovenia and Norway?

Russia topping the group should hardly be considered a surprise as they did so in over 30% of the simulations. Portugal's median points total was 24 (which they would have got had they not allowed Israel a late equaliser in Lisbon) and Russia's was 22, which was their actual total. The perception of Portugal shouldn't change because they required the playoffs to progress to Brazil.

England's qualification campaign has already been covered in detail here.
France were unlikely to topple Spain as winners of the group, although they did so in nearly 20% of the simulations.They ultimately took their most likely road to Brazil, as runners up and then by way of the playoffs, although being France, the latter was not achieved without considerable drama and uncertainty.

With a couple of exceptions, the 17 UEFA representatives could broadly be predicted before a qualifying ball was kicked. The seeding process, coupled with the large talent gap between the best and the worst of European national football, invariably gives half of the sides in each group very little chance of scooping the top slot. But the truncated nature of the qualifying process does give middle ranking teams the opportunity to jump a place or two above their natural long term station. Rather like merely good sides occupying the elevated Champions League placings in the EPL after a dozen matches.

Sides that produce a couple of atypically good or bad results, especially in highly weighted matches, such as WC and Euro qualifiers can fall prey to FIFA's blunt rating system, where results are understandably considered interchangeable with true ability, with no room for random chance. 

Belgium's relatively poor WC and Euro results prior to July 2011, led to their placing in pot 3 for the 2014 draw, but their (possibly) small sample sized over performance in group A should see them set fair for upcoming future draws, baring a major meltdown in Brazil. 

Neither a placing in pot 3 in July 2011 nor a top three rating now, truly reflects Belgium's actual ability, anymore than France possibly deserve to be struggling to clamber back into the elite on a diet of lowly rated friendlies and two fewer qualifying matches. But the process has done a decent job of producing the cream of European team talent for Brazil 2014 (even if some individual players will miss out). 

At the very least a ratings system based on ranking points exchange, with too few matches for lucky streaks to be fully eradicated from the system before they are used to shape major competitions, merely adds to the uncertainty and excitement, both for the team that is out of place and the sides that have the unwelcome task of taking them on early in major tournaments.   

Sunday 17 November 2013

Old School Meets New School.

Evaluating the abilities of a football team can take many forms. From a purely numerical analysis, where their strengths and weaknesses are expressed as lines on a spreadsheet and future performance is cited in probability, to gut instinct formed through actually watching a side interact and play the sport in the flesh, complemented with a knowledge of their recent player acquisitions and departures.

Neither preference is guaranteed to capture the complete nuance of each side's true worth. So can an integrated approach that borrows from both sides of the divide provide a worthwhile collaboration?

In this guest post I try to combine the strengths of both approaches.

Friday 8 November 2013

Conversion Percentage Inside the Box. Mind The Gap.

So I used 2011/12 seasonal shot data to look at all goal attempts to see if the rate of scoring was merely the result of random variation around a common mean or if their was likely to be a genuine difference between the best and the worst sides in terms of conversion efficiency.

I split the sample between shots inside the box and shots from everywhere else (including shots and goals scored from you own box.....Tim Howard, take a bow), in an attempt to maintain a decent sample size, but to also smooth out any positional shooting preferences among the teams.

The method sees if the distribution of goals scored by each side, given their relative shot totals, is substantially different from the range you may expect if every side is equally talented at converting similar types of chances.

I deliberately left in penalty kicks for shots from inside the box, because I wanted to see how the conclusions changed as certain types of readily identifiable and unevenly distributed shots were removed from the sample.

So the first run included every goal attempt from inside the box. The distribution of goals scored by the twenty sides did appear to differ markedly from the spread you might expect to see if Manchester City had had 450+ attempts and Stoke had 230+ with every other team contained somewhere between those shooting extremes, but all sides had striking talent that was equally adept at converting the chances that fell to them.

Next, I took out penalties, which tend over time to be given to those that do the most attacking and present a significantly higher chance of scoring that other, open play opportunities from inside the box.

Virtually the same result.

Compared to the sample with penalties, we do edge very slightly closer to a distribution of actual goals in 2011/12 that better resembles a random draw from an equally talented 20 team strike force being presented with varying numbers of opportunities. But we still can very safely say that our actual spread of goals from 2011/12 doesn't resemble a lucky dip with a universal  strike rate. About 2% of teams manged at least 60 goals from the distribution of shots actually attempted by teams during 2011/12 in simulations using a universal, average conversion rate. In reality during 2011/12, three teams out of 20 managed to surpass this target.

So I then took out headers.

Overall, headers present a poorer likelihood of success compared to shots and in 2011/12 headers comprised a heft chunk of the total goalmouth attempts for some teams, (no prizes for guessing Stoke).

With headers culled from the data, the difference between the actual distribution and the range you might expect from one drawn from a group of equally lethal strikes, plummeted to within touching distance of each other.

It is just one season, but once you take out penalties and headers, then the number of goals scored by all other means inside the box, still differs from what might expect to occur by random chance where there is no difference in the finishing talents of each forward line, but the gap is small....Very small.

Here's the regressed conversion rates for shots (with the feet) inside the box for sides from 2011/12 suggested by the above analysis.

EPL Side from 2011/12. Regressed Conversion Rate for Foot Shots Inside the Box %.
Newcastle. 14.9
Arsenal. 14.8
Chelsea. 14.8
Manchester United. 14.7
Norwich. 14.6
Tottenham. 14.4
Wolves. 14.3
Manchester City. 14.2
Aston Villa. 14.1
QPR. 14.1
Stoke. 14.0
Sunderland. 14.0
Bolton. 14.0
Everton. 13.9
Blackburn. 13.9
Swansea. 13.9
Fulham. 13.8
WBA. 13.8
Wigan. 13.2
Liverpool. 13.1

To put these figures into perspective, the difference in conversion rates between top and bottom, given an average number of shots from inside the box (240) accounts for 4 extra goals and that represents about three league points.

If we ignore Newcastle at the top and Liverpool at the bottom, both of whom broke most statistical models during 2011/12, the actual top five from 2011/12 are to be found in the top seven for converting shots inside the box. And relegated Bolton and Blackburn are at least in the bottom half. So the ranking is fairly consistent with league position in May.

By attempting to produce a reasonably sized, homogeneous sample size, the gap between the degree by which real life conversion rates fall, at first slightly and then precipitously towards a random draw is seen. There's still evidence for a talent divide at the very top, but it is narrowing, throwing the importance of shot volume into the spotlight.

Ten years worth of shot data would be nice to see which side of the line shot conversion rates finally settle on!

Thursday 7 November 2013

Begovic Scores! We Draw!

All of the best photographers appear to have a sixth sense about when a worthwhile picture opportunity is about to arise. It was therefore no surprise that my camera had just been returned to my rucksack when Asmir Begovic (a goalkeeper) scored for Stoke (after 13 seconds) against Southampton at the Britannia Stadium on Saturday.

There were clues available that would have indicated that Mark Hughes had a plan. All teams have a preferred end to attack in the second half and there appears to be a tacit agreement between captains that if the visitor wins the coin toss, he will take the kick off, rather than earn the early game wrath of the home supporters by turning the teams around. Therefore, the undercurrent of discontent that accompanied the sides swapping ends after the coin toss on Saturday, turned to slight bemusement as Southampton lined up to take the kickoff.

Stoke had turned themselves around.

The geography of the Britannia Stadium makes it an ideal site for endurance training. The prevailing wind regularly blows in from Trentham, funnels itself through the two open corners at the south end of the ground and then struggles to exit at the single open corner to the left of the Boothen End, tipping a hat to the statue depicting the three ages of Sir Stan as it carries on towards the city.

Stoke now have a chequered history with near gale force winds. In the distant past it has removed the roof of the Butler Street Stand (along with our best striker to pay for the uninsured infrastructure), but the worst it has managed at the Britannia was the late postponement of a game against WBA. On Saturday it provided Stoke with the opening goal by way of partial payback.

In the subsequent press conference, Mark Hughes acknowledged the deliberate decision to play with the wind during the first half, citing the importance of scoring first, which is encouraging. However, it is (hopefully) unlikely that his keeper was considered the most likely scorer. Hughes' apparent encouragement for his sides to shoot from distance, so vividly demonstrated at QPR, must have some bounds.

Following the kick off, Southampton chose to attack Stoke with a series of intricate passes. This quickly broke down, the ball was rolled back to Begovic, who launched a wind assisted punt goalwards. Both Southampton defenders chose to ignore the golden rule of defending, namely "never, ever let the ball bounce" and Boruc was left embarrassed by a slick bounce on the wet surface.

Begovic is the fifth keeper to score in the Premiership and invariably such goals require additional help or unusual circumstances. Tim Howard's effort against Bolton was a replica of Begovic's goal, but Peter Schmeichel, when playing for Villa opened the goalkeeping Premeiership goal tally from the more advanced position of the opposing penalty area. So modelling the likelihood of a keeper netting is going to be hugely situational.

We may have more luck trying to quantify the quickfire timing of the goal.

Stoke v Southampton, two minutes 13 secs from history being made (not shown).
The chance of a goal being scored increases slowly, but inexorably as time elapses as caution and fitness, gives way to adventure and fatigue. Your chances of seeing a goal during the sixty seconds that comprise the 8th minute is only 80% of your chances of seeing one in the 80th.

However, three "sixty second" intervals are completely atypical compared to this gradual cranking up of goal expectation. The 45th is the second most goal laden "minute" followed by the 90th and the reasons are clear. Injury time extends both minutes well beyond sixty seconds leading to two big spikes. Therefore these increased rates are purely artificially created by traditional timing considerations. The second half starts with the first second of the 46th minute, even if the first half stretched well beyond 45 minutes of actual time.

The barren blip that comes with the first minute, by contrast is entirely real. If you chose any sixty second period in the first ten minutes, you are likely to see around 0.7 to 0.8 percent of the total goals scored during the match, on average. But if you plump for the first 60 seconds of a match, you will be lucky to see much more than half of the typical early minutes goal percentage.

Again, the reasons are fairly plain to see. Every game starts with a kickoff. The ball is about as far from either goal as it can possibly be and all eleven players are positioned between the ball and their goal and this formation of maximum protection for each goalmouth is guaranteed to occur during the first minute in every match. Hence the scoring is not only at its lowest because of the usual ebb of intent and desire to score, it is atypically lower because of the requirement to start the game with a kickoff.

So, pulling all the information together, around 0.5% of goals come in the first minute, the first 13 seconds are likely to see less than a pro rata division of this goal expectation because of the guaranteed safe starting position for the ball. A back of the envelop calculation using these figures and the average scoring expectation over the history of the Premiership gives an average goal expectation for the first 13 seconds of a Premiership match of around 0.0015 of a goal. The chances of scoring twice or more in 13 seconds is impossible, therefore a goal in the opening 13 seconds should, under these informed gu-estimations happen around once every 666 matches.

So, unlikely as Begovic's goal was purely from a timing perspective on that particular Saturday afternoon, we should expect to have seen around a dozen such goals scored before the 14th second has elapsed over the 8,326 game history of the Premiership.

Begovic's strike was the sixth such effort, so maybe the primeval order at kickoff takes slightly longer to descend into chaotic normality than I accounted for or Premiership audiences have just been slightly unlucky.

If you have to miss a minute of a match and you want to reduce your chances of missing a goal, chose the first minute, (although you may be really unlucky a miss club history in the making), but at least on Saturday any tardy spectator got to see a match featuring two keepers whom had both scored a career goal, (although Boruc's strike came from the altogether more likely source of the penalty spot).