Pages

Sunday, 26 February 2012

Defending a 2-0 lead in the EPL.

One of the many hoary old football myths is that the hardest lead to defend is 2-0.The leading team feels the win is already in the bag and the opposition,with little to lose,play with much more freedom,take outrageous chances and soon the score is back to all square.Of course the reason we think that a team is very vulnerable when leading by two goals is because we tend to remember the times when that lead collapses and forget the more numerous occasions when the lead is converted to a win.

2-0 isn't a dangerous place to find yourself in,it's actually a very comfortable spot.

Below I've listed the chances of various teams converting just such a lead into three points and then I've charted real outcomes of three such games from the weekend round of Premiership matches.No prizes for guessing which game will live long in the memory and which one will be forgotten by next  week.....although watching a pressing,long ball side (Stoke) totally frustrate and disrupt a short,possession based side (Swansea),did have it's attractions.26% of the possession equalling 100% of the points.

Scenarios for Teams who Lead 2-0.

Chance of Winning from 2-0 after 40 mins Chance of Winning from 2-0 after 60 mins.
Average Team at Home to Average Team.
93.5% 95.8%
Average Team Away to Average Team.
86.0% 91.2%
Top Team at Home to Average Team.
98.9% 99.2%
Average Team at Home to Top Team.
74.7% 83.7%

Arsenal 5 Tottenham 2.






























Scorers.
0-1,Saha,4'
0-2,Adebayor(pen),34'
1-2,Sagna,41'
2-2,van Persie,43'
3-2,Rosicky,51'
4-2,Walcott,65'
5-2,Walcott,68'
5-2,Red Card,Parker,(Spurs),87'.

Newcastle 2 Wolves 2.






























Scorers.
1-0,Cisse,6'
2-0,Gutierrez,18'
2-1,Jarvis,51'
2-2,Doyle,66'

Stoke 2 Swansea 0. 




















Scorers.
1-0,Upson,25'
2-0,Crouch,39'.

Friday, 24 February 2012

Is the MLS spending it's Money Correctly?

With the recent perfect storm consisting of the release of the Moneyball movie,featuring Oakland A's GM, Billy Beane and the publishing of the salary data for last season's MLS,it was inevitable that a slew of articles documenting the cost and true value of the talent in American soccer.The data can be downloaded here and the site also gives information from the 2007 season onwards.Helpfully each player is labelled as either a striker,defender,keeper or midfielder,although there is some smearing of the lines especially between the midfield and the strikers and defenders.

Unsurprisingly,it is the strikers who on average were the highest wage earners last season,followed at a respectful distance by the midfielders,who in turn are trailed by the defenders and goal keepers.However,before we can say which group is either under or over valued we need a mechanism whereby we can value a players contribution to the eventual seasonal record of the team.

Player ratings have now been around for a few years,having evolved from the perennially popular fantasy football,but all but the best are partly unsatisfactory for a variety of reasons.Measuring and counting of the on field actions of the players have become both sophisticated and more reliable in recent years,but agreement as to which statistic actively causes rather than merely correlates to winning is far from universal.Much of the readily available statistics comes without even basic context,such as the current score and player ratings presumably derived from proprietary data that isn't scoreline neutral is understandably "black box" in nature.Finally,condensing a player's worth to a single number is either a massive achievement or a major oversimplification. For these reasons I've decided to rate the contribution of each facet of the team in terms of goals scored or conceded and a large portion of this post will consist of subjective,but hopefully informed opinion.  

We've established here that a team's goal difference is a very good indicator of it's end of season success rate and if we plot both goals scored and goals conceded against success rate we find that an extra goal scored increases at team's success rate by virtually the same amount as does an extra goal prevented.Therefore we can assign a value to a player's worth by reference to the amount of goals he helps to score and the amount he helps to prevent and these numbers correlate very well to a team's end of year record.We are initially interested in the broader picture,namely,which area of a team is under or overrated in relation to the amount of rewards they receive? Individual good or bad buys will exist in any team,but the purpose here is to see if MLS teams as a whole make good use of the money they spend.

Strikers are primarily paid to score goals and as a group they accounted for over 50% of scores in 2011,35% originated from players described as midfielders and defenders supplied the final 10%.We could leave those proportions alone to represent the contribution to goals scored made by each different area of the side,but it's likely that more of the goals were created by the midfield,slightly less by the strikers and much less by the defence,So I've tweaked the striker's contribution down a notch or two,along with that of the defence and added the surfeit to the midfield.

The splits I've chosen for each unit's contribution to goal prevention partly reflects the distribution of labour at the front end of the pitch.The defence is primarily,but not exclusively responsible for stopping goals.The midfield are the first line of defence and it's rare for someone who isn't an out and out striker to be totally excused tracking back duties,while strikers can make the defensive burden easier merely by retaining the ball high up the pitch.

We know that a goal scored is roughly equal in value to one prevented,so the final step is to simply average the individual defensive and attacking contributions to get an overall value for strikers,midfielders and defenders.The process may appear haphazard,but the offensive calculation is underpinned by actual goals scored.Furthermore,the combined commercially produced defensive ratings of individual players correlates strongly with their team concession rates,so until we become better able to isolate correlated and causative on field data,I feel this approach is as valid as any other.Feel free to tweak the numbers I've used before we move onto the conclusions.

Aspects of Team Contribution and Pay for Various Different Components of MLS sides,2011.

Strikers. Defenders +
Keeper.
Midfielders.
% Contribution to Goals Scored. 53% 7% 40%
% Contribution to  Preventing Goals. 5% 60% 35%
Average Wage Bill per Player.$ 200,000 110,000 140,000
Total % Contribution to Team's Goal Diff. 29% 34% 38%
Total Cost of 4-4-2 Formation.$ 400,000 550,000 560,000
4-4-2 Formation Cost as % of Starting Team Cost. 26% 36% 37%
 4-4-2 Formation Cost as % of  Total Roster Cost. 31% 31% 38%

A virtual absence from the defensive duties of their side sees the strikers contributing around 30% towards an MLS side's goal difference under my slightly harsh regime.The defence's overall shout amounts to just over a third and the midfield works hardest claiming almost 40% of the effort.Strikers receive on average $2000,000 almost twice that paid to defenders and keepers and comfortably ahead of midfielders. I'm aware that others have also noted similar levels of excess reward for less output.....but there are a couple of wrinkles.

Teams predominately play a version of 4-4-2,therefore the 29% output from the strikers is almost always the product of just two on field representatives.Defenders contribute more,but usually require 5 bodies achieve their production and the midfield achieve more still,but this time with 4 players.If we "price up" the 11 players who actually take to the pitch,we find that the $400,000 worth of striking talent represents just 26% of the seasonal cost of the starting team,but repays the team with 29% of the contribution to that team's year end goal difference.Which in turn correlates with and almost certainly causes the team's final success rate.

Defenders as a group,by my ballpark figures,contribute more,but need more on field representation to achieve those numbers and actually are slightly in debt to the club when we tally the figures,while the midfield's output virtually matches their cost to the club.

If we look at the roster as a whole,the cost and return figures also gravitate towards each other,indicating that MLS teams actually pay their players in a fairly efficient manner.Strikers are paid more,but less are required on the field and on the roster (a MLS team has around 6 strikers for every 12 defensive players and every 12 midfielders).

In short,strikers cost the roster 31% of an MLS team's outlay and contribute 29% of the team's final day success rate and that money is then shared out between each member of the striking group.The same with defenders and midfielders.But because they required a disproportionately larger number of members to achieve just a slightly higher productive output,each individual receives less than the smaller group of similarly productive attacking players.

You could even argue that the best strikers are worth more than the price they are currently trading at.Where constraints exist on the number of people you can employ,areas where the constraints are greatest become more valuable.You can not improve the side by replacing one good striker with three average (and cheaper) ones.Even if the talent of his three replacements was additive and exceeded his usual individual output,you have taken up two extra spots in the team.The only way to dramatically improve your attack is a one for one swap with a better player,whose improved talent will only be averaged down by one fellow striker.

A speculative piece on a subject that has far to go,but one with enough moderately hard facts to suggest that the MLS purse strings are being loosened mostly in the right direction.

Thursday, 23 February 2012

Manchester United's Chances of Being Relegated.

In the short history of the Premiership there have been just four different Champions.Manchester United top the list with 12 titles,followed rather distantly by Arsenal and Chelsea with three a piece and the group is completed by Blackburn.Three perennial contenders and a Blackburn team,built on a cash influx from a committed owner who was lucky enough to secure the services of Alan Shearer for the then not inconsiderable sum of £3.5 million.The presence of Blackburn's name on the list,especially in view of their recent and present struggles may surprise some,but Jack Walker's millions merely confirms that winners of the Premiership title are always sourced from the best and wealthiest teams of that particular time.

At the other end of the table the end of year booby prizes have been spread around much more liberally,not least because the teams who make the roll call of failures aren't allowed to make a reappearance for at least one season.Around a third of the current members of the Premier League  have at one time or another been relegated from the exclusive club and leading lights from the past such as Leeds,Nottingham Forest,Sheffield Wednesday and even Wimbledon have slipped through the trapdoor.

Football fans as a group tend to be a fairly realistic bunch especially when focussing on the likely prospects of their teams in a forthcoming season.Sights may be momentarily raised or lowered to suit the mood but most mid table supporters will hope for Premiership safety first.Then dream about a possible attempt to reach the foothills of a minor Europa League campaign,but will also harbour the fears about being dragged into a possible relegation scrap.

Perennial strugglers and new arrivals will hope to have still retained their star defender and the same manager by Christmas and their Premiership status come May.Any excursion into mid table will usually be seen as temporary respite,rather than a great leap forward.

Fans of the big four or five naturally set the bar highest. Championship ambitious usually depend on their recent spot in the top pecking order and any back sliding into the lower reaches of the stratosphere is viewed as an almost unthinkable catastrophe.

Naturally these views will be formulated from hours of watching or reading about our teams from previous seasons and assessing any team changes that may have occurred during the summer windows.Usually fans and pundits are reasonably accurate in these informal appraisals.You don't have to even be a particularly avid follower to game to be able to name the likely top four or bottom half dozen teams.However,occasionally there are surprises when a team busts clear of the limits of their usual station or falls below the levels of those predicted by their worst pessimists.

Often these bursts of unpredictable results are achieved by teams consisting of the same bunch of players who trundled unremarkably through the previous year.So what has changed? Often the answer is absolutely nothing and the team is merely experiencing the kind of randomness that exists when a process is repeated enough times even when the underlying event probabilities are similar to values seen previously.

Toss a coin enough times and you sooner or later you'll see ten consecutive heads,it's a natural sequence that will be thrown up by a fair coin.But if you're a supporter of "tails" and you see that sequence and in you horror you momentarily forget the "good times" that had gone previously,it's easy to think that your favourite coin has gone bad.

The Premiership Trophy....look,but Don't Touch for all but the best.

 If small sample size fluctuations occur in coin tosses that lead to apparently abnormal results even when the underlying chances have remained the same,we should see teams sometimes experiencing similar over the short 38 game Premiership season.If we simulate and repeat enough seasons,then just as a coin can churn out head after head through random bunching,so can a mid table outfit gain enough wins to climb higher in the natural order or enough defeats to possibly drop out of the Premiership all together and George Burley's Ipswich are the best Premiership example of the former as we can see here.

To see how likely it was that a team could rise above or fall well below their true talent,I used a simple goals based rating system to produce the kind of win/draw and loss probabilities a side would expect to encounter over a EPL season.I did this for a top four side expected to average around 84 points in a season,a slightly above average mid table team who would amass 52 points and a struggling team who would fail to break the 40 point barrier.

I then ran 1,000 simulations for each generic team for a 38 game season using the game probabilities that teams of such quality would encounter and for each "season" I totalled the number of points each team would have ended the season with.From each season to the next,each team had the same chance of winning or drawing each individual game as they had had in a previous "season" and the only variation in points total arose from the random bunching or none bunching of results picked by a randomly generated number.The better teams of course were more favoured to win their individual games.

Outcome per 1,000 EPL Simulations for Teams of Varying True Quality.

One of the Best Top
Four Teams.
A Good Mid
Table Team.
A Relegation
Candidate.
Number of
Championships.
404 1? 0
Number of
Relegations.
0 66 587
Highest
Points Total.
106 82 67
Lowest
Points Total.
60 28 19

The results of this simplified exercise appear to mirror the kind of subjective opinions you would hear if you talked to fans of these types of sides and good and bad news is equally shared between the three groups.

If you support the best of the best,as a team averaging 84 points would be then you have every right to think that the league title is a likely outcome.A 100+ points is unlikely,occurring 5 times in each 1,000 simulations,but definitely attainable.More realistically,teams who are destined to perform well enough to even approach 100 points will almost certainly wrap the title up long before end of season games take place.So desire or lack of it may be why this simple model perhaps over estimates the likelihood of 100+ years.The lowest points total such teams achieved was 60,dropping them into the ignominy of Europa Cup participation without the benefit of a pre Christmas Champions League campaign.Although,points deductions aside,a team of this true talent will never have an "unlucky" enough year to fear relegation.So to answer the original question,the current Manchester United side would almost certainly not be relegated this side of the Norman Conquest.

Things can look less rosy for our other two teams.Good mid table teams like Everton or Villa from the recent past may sneak one Championship once every 1,000 or so seasons,although their maximum haul of 82 points would hardly indicate a vintage year.Much more of a worry for fans and managers and a conundrum for chairmen are the 66 occasions when a true middle ranker gets randomly relegated.It's probably much better to dream about the 17or so times when fortune smiles and 70 or more points are banked.

The pessimism of fans of lowly teams is well justified with relegation an odds on proposition.They can expect half a dozen forays into the 60's,but a shade over 100 seasons every 1,000 when they don't reach into the 30's. They can certainly forget any chances that they might just win the Premiership Trophy by accident.

Sunday, 19 February 2012

Win Expectancy Graphs for Stoke's last two Red Cards.

Stoke grabbed an unwanted February double with two straight red cards a fortnight apart,the first at home to Sunderland just before the interval,the second less than 20 minutes into their cuptie at Crawley.Both were controversial and both despite being separated by over twenty minutes in the respective games,left the Potters in virtually identical situations.

Premiership rivals Sunderland had only 45 minutes to press home their numerical advantage,but the importance of scoring first in any game was highlighted by McClean's 61' cross shot.

Second division Crawley had a potential 70 minutes of 11 verses 10 to go with their home advantage,which gave them about the same chance of winning as Sunderland had had two weeks earlier at a snow swept Britannia,but a soft penalty award scuppered their high hopes.

Stoke v Sunderland in the EPL.

























Scorers.
0-0,Red Card,Huth,44'
0-1,McClean,61'

Crawley v Stoke,FA Cup 5th Round.

























Scorers.
0-0,Red Card,Delap,17'
0-1,Walters (pen),42'
0-2,Crouch,52'

Avoiding Costly Errors in the EPL ?

Juande Ramos' Tottenham started their 2008/09 campaign hoping to build on their mid table finish from the previous year and they showed their intent early by completing the signing of Luka Modric .However, the ongoing saga of whether or not Berbatov would even consent to pull on a Spurs shirt would drag on almost to the end of the summer transfer window and the Bulgarian eventually followed Robbie Keane out of White Hart Lane.As £50 million of attacking talent departed it was left to Darren Bent who had been less than prolific during the previous year to carry the forward line and his first half injury time,point claiming strike at Stamford Bridge was the highlight of Tottenham's early season.

The low points,though were many and varied.Defeats to North East giants Sunderland and Middlesbrough preceded the point at Chelsea.A solitary point from the visits of Villa and Wigan merely heralded further defeats at the hands of Portsmouth,Hull and yet another in disgrace at Stoke where Spurs left the field with just nine men.Inevitably,Ramos was gone,the Potteries chants of "sacked in the morning" proving premature by only five days and the Redknapp era began with a 2-0 win at home to Bolton.

By May Spurs were placed comfortably in the top half of the table with a tally of 51 points,Bent had forced his way into the England squad and Harry's revolution that would take Spurs to the Champions League was gaining momentum.They finished the season with a goal difference of precisely zero and their actual points haul was within a point of the average expectation for a team who had scored as many goals as they had conceded.Of more interest was Spurs' unwanted record of conceding 14 goals directly as a result of individual errors.This compared to the league average of  just six goals allowed in such a way and in an environment where small margins matter it was an obvious source for improvement.If Spurs had been merely average they would have saved themselves around 8 goals,improved their goal difference and likely picked up an extra seven or eight league points.Enough to see them rise from the basement in September to Europe come May.

The 2009/10 season was Redknapp's first full season in charge and with his squad and hierarchy now in place,the England manager in waiting broke the Champions League monopoly by qualifying for the playoff stages of the tournament with a top four finish.Scoring goals played a big part in Spurs' achievement,but the previous year's habit of handing the opposition cheap goals through costly mistakes didn't go unnoticed.14 such strikes in 2008/09 would have cost Spurs their lucrative route into Europe had their carelessness been repeated,but it wasn't and instead they cantered home with an above average tally of only three such error strewn concessions.

A great story of attention to statistical detail,corrective action and a positive outcome.........except the plausibly seductive narrative is almost certainly a gross distortion of what actually occurred.

Errors that lead directly to goals are relatively rare events,from 2008/09 to 2010/11 the average number conceded by the Premiership sides was a shade over six.Tottenham's 14 from 2008/09 was the highest seasonal total recorded,although Arsenal could be on course to top that total this year.Teams such as Wigan and Hull also recorded similar numbers as their higher ranked compatriots,but the likes of Wigan again and West Ham also impressed with very low totals over the last four years.So these are a mixed bunch of results,with very little obvious pattern and viewed as a group the figures tend to resemble the seasonal distribution of draws.We saw here that draws are predominately random events where team quality has little bearing on the amount of draws a team gains over the year and season to season correlation is extremely weak.

To see if this was the case with errors leading to the concession of goals I plotted the totals in one season for all EPL teams and paired them will their records from the next season.


The scatter plot is very random,a high number of such goals in one season does not automatically lead to a similarly high number being given up during the next one.The near horizontal line of best fit that runs close to the seasonal average indicates that no matter how high or low the goals tally was in a previous season,the best guess for the next would be to regress towards the mean.We can further confirm this by taking a basket of teams who were particularly good and particularly bad at avoiding the concession of goals from errors in one season and averaging their combined totals in the next.In both cases the outliers from the previous season are dragged much closer to the average for the league in the subsequent season.

How the Best and Worst Error Strewn Teams Perform in Subsequent Years.

Average Goals Allowed from Errors for Best Performers in Season N. Average Goals Allowed from Errors for Same Teams in Season N+1.
2.4 6.4

Average Goals Allowed from Errors for Worst Performers in Season N. Average Goals Allowed from Errors for Worst Performers in Season N+1.
10.7 5.6

Given these initial results,the avoidance of goal yielding errors would not appear to be a largely repeatable skill.Teams who have the "skill" one year often lose it the next and become merely average.Also more talented teams appear to be no better at  displaying the trait than do poorer ones because if you plot the correlation between seasonal success rate and goals allowed through errors is also virtually non existent.

One of the most dangerous assumptions to make in analysis of football is that what happens on the field is almost entirely down to the talent of the players on the pitch.Many outcomes during the game are predominately talent based,but all are determined in varying degrees by a randomness that is beyond the control of the on field participants and goals scored from errors would seem to be a prime candidate for a statistic that is primarily driven by chance.These goals affect and partly explain a team's final league standing,but are poor indicators of how teams will subsequently perform in this aspect of the game in the future.

In short,teams cannot escape the random nature of mistakes,nor can they control the outcomes of those mistakes.One season almost every slip up might lead to a goal,while other seasons may simply see most of these chances being ballooned over the bar by the opposition.All that EPL teams can expect,judged on the distribution of the data is that over time they will each concede an average of around six goals in such circumstances.

The manipulation of the new raft of data is fraught with difficulty and just as players find it difficult to accept that their skill levels aren't the final arbiters of the ultimate game outcome,a new breed of number crunchers may have difficulty in accepting that a future event cannot be predicted with any degree of accuracy.Sometimes,as would appear the case here,the best guess is merely an average from all teams over previous seasons.

To return to Tottenham's horrendous "luck" where errors led to goals in 2008/09.Every player bar one whose mistakes contributed to those 14 goals returned as Spurs players in 2009/10 when just three mistakes led to goals.It's easy to envisage extensive pre season sessions aimed at "cutting out the errors" and when that was precisely what occurred it's easy to think that the therapy worked and that players can be taught to be less careless.However,the most likely explanation was that Spurs would have seen less mistakes leading to goals through natural regression towards the mean anyway in 2009/10 and the talent that serves players well in making them professional footballers equally serves them in avoiding costly errors....whatever the short term numbers say.

Thursday, 16 February 2012

Record Signings in the EPL.Knowing When To Let Go.

Choosing which team to follow can be an ingrained rite of passage passed down like small children at a pre War sellout or simply the result of a haphazard whim,but once joined the relationship is usually for life.No matter how inept,incompetent the team or disloyal some of it's players,there's always a strong enough bond that says "stick with it" and the good times will return.Managers though have to be more hard nosed about when to let go of an under performing asset who promised much,cost more,but failed to deliver.

Unlike some sports where a General Manager supplies the talent and the Head Coach then gets the best out of that talent on the pitch,football allows the manger a huge say in assembling his squad.Therefore any purchase,but especially an expensive one,will create a strong bond between buyer and purchased.Neither party wants a deal to be a failure,but there exists a stronger and partly irrational desire on the part of the buyer to make the deal a success even if on field performance is telling everyone otherwise.Managers sometimes make purchases that are mistakes.For any number of reason a player may not perform to anywhere near the level that justifies his price tag,but all too frequently the mistake is compounded by a stubborn refusal to acknowledge the mistake,cut the losses and move on.The irony of this loyalty to inferior,yet expensive purchases is that a failure to act decisively often leads to the rose tinted buyer paying the price with his job.

Stoke's promotion season in 2007/08 came as a pleasant,but unexpected surprise to it's largely long suffering fans.Little had changed squadwise from the previous year to make a typically bungled foray into the playoff lottery the height of expectation.Instead a 17 game unbeaten run from late November onwards gave the team a big enough cushion to allow them to follow WBA into the EPL automatically in the runners up spot.The final ten games had seen the goals dry up just at the wrong time and a scoring rate below one a match would have caused irreparable damage had the defence not proved to be similarly miserly.A nervy 0-0 on the last Sunday of the season against a Leicester side fighting to stay in the division was an encapsulation of their last quarter strengths and failings..

A Tom Jones impersonator and a man dressed as a dog celebrate Stoke's elevation to the top division.

Aware that goal scoring had been a problem in the final stages of their run to the Premiership,Stoke's manager,Tony Pulis almost at once splashed a then club record £5.5 million on Reading's Dave Kitson.But the deal went sour almost from the outset and in the 12 games that Kitson started that season he failed to score and mustered less than a shot a game of which just two required the keeper to intervene.Stoke picked up just nine points from a possible 36 in that 12 game stretch,just the kind of return you would expect from a team who were also finding Premiership strikers much more of a handful and were allowing almost two goals a game.

A Kitson injury early in a customary Stoke win over WBA made dropping their record signing easy and in the next 11 games they managed to average just over a point a game.Tellingly the improvement came from the defensive side of the ball and the goals against column averaged just over 1 goal per game instead of being just shy of two.Goals scored per game were the same as they had been with Kitson in the team,so his contribution was being repeated by the supposedly inferior strikers who had gained Stoke promotion previously.

By January 2009 he was permanently replaced for the season by transfer window signing,James Beattie.And three times as many shots,just under half of which were on target and seven goals from 15 starts flew from the boot of the new man resulting in 24 points at over 1.5 points per game and secured safety for the Potters with a game or two to spare.

Part of the problem. Part of the solution.
The interesting aspect of this slice of Stoke City history is not contained in the dry narrative of goals scored,games won or lost,but in the thought processes behind the managerial decisions and the twist of luck and chance that combined to make Beattie a Stoke hero and Kitson not.Was Pulis able to distance himself from the buyer's pride that often exists when large purchases go wrong and was he planning to replace his big money signing even before injury intervened prior to the transfer window ? Or would he have been heartened enough by a couple of late December goals to have kept faith with the price tag rather than the on field performance.

A clue to the answer may be found in the subsequent season.Both Kitson and Beattie started 11 games,although they only played together once in a 2-1 defeat to Chelsea.Kitson doubled his shots per game ratio,but Beattie halved his from the previous campaign and they each managed to find the net just three times.Last year's under achiever rose to the heights of mediocrity and last season's hero drifted sadly down to meet him.By the time of Stoke's Christmas party,neither player had a future at the club.

Small sample size almost certainly exaggerated the real difference in ability of both strikers.But if Pulis was able to quickly evaluate each player's likely contribution to Stoke's survival season and act decisively to maximize the return and then divorce himself from the emotional investment he had in each player to move them on a year down the line,then he possesses a valuable managerial trait.

This type of detached decision making process may also partly explain the "new manager" bounce that sometimes comes with a change at the top.An appalling run of results is often proceeded by a less appalling set of results whether the manager stays or goes.But a new coach is not tied to seeing through a faltering project to the same degree as the present boss who presumably helped to create the squad to start with.James McClean at Sunderland was a long term project who never played under Steve Bruce,but has featured in 7 wins,one draw and 2 defeats under new boss Martin O'Neill.

In short managers might have to be ruthless in cutting their losses and making changes if they want to stay on the managerial roundabout.

Wednesday, 15 February 2012

Correlation and Causation in Football.

When Albert Einstein started work in the Swiss Patent Office his daily mantra was " believe everything is wrong" and that rigorous approach quickly saw him rise to the heady heights of Technical Expert,second class before he took his talents to more demanding fields.A healthy dose of scepticism is a handy asset when you are trying to make sense of the ever increasing raft of statistics that are now available.Not only as a way of challenging pre conceived notions that have gained credence through being repeated often,but rarely validated,but also to guard against new myths merely replacing hoary old ones.One set of numbers can enlighten and inform,but alone will rarely completely describe a system as complex and dynamic as a football team.

Correlation does not imply causation should be the watchword of anyone interested in dipping a toe into the fascinating world of football analysis.As informed football opinion moves from the purely anecdotal to a situation more grounded in measurement and counting,it has to be wary not to fall for the spurious or inverted correlations that have cropped up in other sports and will invariably appear in football.

The number of times an NFL team "takes a knee" has a very strong,positive correlation with the number of games they win.More knees,more wins in short.So does this mean that a coach should send his team out to maximise the number of times his quarterback takes the snap,cradles the ball,sticks a knee on the turf and declines to attempt a play.Intuitively we all know that the answer is that he shouldn't and a passing familiarity with the rules shows us why.Teams invariably take a knee when they're winning in the 4th quarter and inside the two minute warning.The "play" runs out the clock without the necessity of a full on collision between a team whose only interest is to maintain possession and another whose only chance is to rip the ball from the ball carrier.It prevents an already violent sport turning even more so.Rugby union appears to have developed it's own version with a series of half hearted pick and drives from the scrum in the waning minutes of a match.The important point is that it's the winning situation that is causing the kneels and not the kneels that are giving rise to the wins.

That's a fairly obvious case where the direction of causation goes from the winning situation to the on field action,but other examples are more persuasive and seductive for the unwary.If an NFL team runs the ball more often than it's opponent,they are more often the winners and this led to the seemingly reasonable assumption that a team that runs more,wins more.Throwing the ball in the NFL has more associated risk and potentially more reward than running the ball,but the strong correlation between the number of running plays and wins appears to make the ground route the sensible and profitable way to go.The reality is that the current NFL is a pass orientated league and running the ball often will not produce the expected riches in terms of wins because what the correlation is picking up is simply an extension of the "take a knee" described previously.

To explain further,teams on average build up a lead by passing the ball,but then protect that lead and run down the game clock by running the ball.By contrast the losing team has to make ground up quickly when they have the ball so they pass more and run less.The excessive run differential of the winning team tends to appear after they've build up a lead ,it doesn't in general cause the lead in the first place.These two examples should therefore make us cautious when we start to de-construct the building blocks of football.Moving the ball around an American gridiron isn't quite as straight forward as it first appears.So lets move onto football.

Most of the passing analysis in football revolves around passing in certain areas of the field.It's considerably easier to pass the ball along the back four when you are only being closed down by the forward who is unfortunate enough to be playing nearest to the dugout compared to passing in the tighter confines of the final third.Getting bodies behind the ball wasn't a Welsh import to the Potteries circa 2008,it's been a universal tactic in modern times and more "defenders" and less space inevitably makes passing more onerous.Unfortunately from a coaching or team selection viewpoint you can't buy or select  20 successful passes in the final third of the pitch,you can only acquire players.....forwards,midfielders or defenders.So an analysis of passing success by field position can ignore the real life constraints placed on team selection.

To approach the problem of how passing impacts on game outcome from a slightly different perspective,I've split passing performance by position and tried to assess how important it is for a team to have defenders who can pass well compared to say forwards who can.

I've taken data from this current season and calculated successful passes made by designated defenders who have played for each of the 20 EPL clubs as a proportion of all successful passes made by defenders in the season to date.I've repeated this for all designated forwards and converted the proportions to standard scores to allow easier comparison between the to groups.



The correlation for successful passes by defenders is reasonably strong considering we have just looked at this term's EPL matches and the relationship logically sees more successful passes correlating to more team success in terms of wins or draws.




























By contrast with defenders,the correlation between team success and successful passes made by a team's forwards is much weaker.Surprisingly,it would seem that if an average team could chose to improve the passing of either it's defenders or it's forwards by the same amount relative to the league,it would be better off in win terms to plump for the defenders.Intuitively,this seems wrong.As a fan watching a close game,I'm much happier seeing an opponent's defender passing the ball as it's usually in a relatively non threatening area,than I am if a nippy forward is trying  his luck.

If these kind of correlations are repeated in larger samples,I think we may be seeing an example of the current score driving the passing statistics rather than vice versa.Teams who are behind will increasingly push forward.Losing shape in the process and while defenders will have more out and out defending to do it is also relatively easy to pick out a pass into the midfield where the opposition are probably more concerned with creating their own chances than disrupting yours.Also leading teams are more prepared to play the possession friendly  passes along the defensive line when scoring is less of a priority.Every "ole" counts in a comfortable 3-0 canter.Defenders in trailing teams will see the reverse.They will be increasingly asked to get the ball forward quicker,so no cheap keep ball for them and passes will be longer and more speculative as time expires.

The net result will be a situational bias similar to the NFL pass/run frequency,where defenders from teams ahead in the game,who are more likely to ultimately win or at lest draw the game will see their successful passes inflate in comparison to their more likely to be beaten counterparts in the trailing side.The game situation seems to be driving the defensive passing stats,to some degree at least.

Forwards would appear to be much more about one or two killer,goal creating passes rather than a steady accumulation of safer options and the much more smeared out correlation plot possibly reflects this.The difficulty of completing passes higher up the field is accounted for by the standardisation of the data,but what about the larger win boost a team appears to get by improving the passing abilities of it's defenders compared to improving the forwards by the same amount.We may speculate that the comparatively smaller benefit in increased team success seen for more successful passing by the forwards compared to defenders could be down to those forwards receiving less team mate support in a still well populated final third when their team is ahead.This sees a decline in their successful completions.

In short,defenders passing numbers may be boosted by the  winning game position,but that winning position may have be caused by the quality if not the quantity of the passes made earlier by the forwards.If those forwards then find passing more difficult because colleagues are more concerned with defence,the forwards may see their pass success rates fall.The danger is then that the flawed connection between having better passing defenders and winning may be made and we start to weave the erroneous,but plausible idea that successful teams build from the back.

That recently signed,cultured centre back who passes well most probably won't increase your number of wins by anywhere near the amount you expect (because the correlation is false) and his completion rate will probably drop as well because you bought from a more successful team than your own.

This post is a cautionary tale.The data is limited and the conclusions may not persist to the same extent in larger samples,but data crunchers should not expect teams to have the same objectives throughout a game and qualities that cause a lead often become lower priorities when defending one.The real question is how do teams take the lead and then how do they keep it and that requires two different datasets.

Sunday, 12 February 2012

Sunderland 1 Arsenal 2.Henry Proves His Worth.

If yesterday's cameo performance from Thierry Henry does indeed prove to be his last final appearance for the Gunners,he will have departed on the kind of triumphal note that ran through his earlier,more prolonged time in North London.It will also vindicate the decision by Wenger to recall the ageing superstar and more importantly validate the conclusion in this earlier post.The title of our earlier post was "Thierry Henry.A Deal Well Done?" and it speculated that the decline in scoring prowess that happens in all prolific scorers had occurred later than normal in the case of Henry,making him potentially a very good short term acquisition.Two 90th minutes EPL goals in barely a game's worth of playing time,along with an FA cup winner against Leeds means that we are fully justified in removing the "?" from the original post.His two goals in four appearances was pretty close to our projected rate of two goals from every five games...but claiming too much credit for the prediction would break this site's Prime Directive.Namely that small sample sizes are greatly down to luck,but at least we were on the correct side of the debate.

It's much more interesting to speculate on whether Arsenal delighted their accountants by making a profit on the deal and to do this we'll try to put a price tag on each of Henery's EPL strikes.The first one came against Blackburn in the 90th minute of a 7-1 romp and added precisely nothing to Arsenal's win probability,but it's his goal at the Stadium of Light that has potential to make big returns.

Sunderland 1 Arsenal 2 Win Probability Graph.






























Scorers.
1-0,McClean,70'
1-1,Ramsey,75'
1-2,Henry,90+1'

A fairly drab looking graph suddenly came to life in the 70th minute when McClean put his side in front for the second successive Saturday.Ramsey's leveller five minutes later restored the near parity of the previous three quarters of the match,but a late tie breaker always causes massive swings in Expected Points and Henry provided one a minute into stoppage time.

If we look closer at the actual figures,just prior to Henry's late winner Arsenal would expect to average just under 1.13 Expected Points from such a match.Come the goal and that jumped to 2.85 EP.So Henry's goal was worth about 1.72 EPs.A more useful way of looking at this situation is to add the Expected Points pre and post the Henry goal to the points Arsenal had already gained this season and divide by their 25 games played to see what their average points per game total would look like in both situations.Before Henry's goal Arsenal were averaging 1.645 points per game,after the goal that figure jumps to 1.709.

Our next step is to estimate the EPL finishing position for teams with these points per games total.Arsenal won't realistically finish out of the top ten so I've plotted the points per game totals and the finishing positions of the EPL teams from 2000 onwards for the top half of the table.

Finishing Positions and Points per Game for the EPL 2000-2010.



 The line of best fit for Arsenal pre the 91st minute goal predicts an average finishing position of 5.4.After Henry's leaving present Arsenal find that they have become a likely 1.71 points per game side and their predicted finishing position rises to 4.8.On 2010/11 numbers the prize money for fourth,fifth and sixth ranged from £12.85 million to £11.34 million and if we perform a crude division of funds we can calculate the end of season winnings.A team with an average finishing position of 5.4th would bag £11.8 million in prize funds compared to £12.25 million with an average finishing place of 4.8th.

So if we want to try to quantify the unquantifiable,namely an injury time winner where fan delight rockets off the scale,we can say that it was potentially worth almost half a million pounds.Whether he stays or goes,Henry has certainly proved that he still has Va Va Voom.....and he threw in a cup run for free.

Saturday, 11 February 2012

The EPL's Best Keeper.

*A season ending version of these number can be found here

The most exposed playing position on the football pitch is that of goalkeeper.One minute you're flying through the air at Wembley to turn a goalbound Balotelli effort past the post with your "wrong hand" and the next you're allowing a speculative thirty yarder to squirm over the line at a windswept Britannia Stadium.A specialist in a team of comparative all rounders,any mistake or act of brilliance made by the last line of defence is often vital to the outcome of the game,rarely forgotten, and one of the first statistics to be paraded at game's end.Just as strikers tend to live or die by their goals scoring records,keepers are judged almost entirely on the saves to goals ratio.A keeper is the footballing equivalent of an NFL kicker,an undervalued specialist asset who is responsible for half the scoring and isn't allowed to let his position defining statistics drop too much before termination of contract becomes a distinct possibility.

Thomas Sorensen prepares to be fooled by randomness.
So how fair is it to judge a keeper on the only set of statistics he can be expected to produce.As with most tentative steps into player evaluation  we must accept and acknowledge that the data we have is inevitably going to be of limited sample size and therefore contain a component that is dependent upon the the player's actual talent and a second contribution that is going to be down to random chance.The smaller the sample size,the larger the contribution from randomness.To illustrate the point consider an untested keeper who plays the last ten minutes of a game because of injury and during that time he faces one reasonably difficult shot,it's around a 70% chance that he will make the save,(70% is around the average save rate for EPL keepers over recent seasons).As he stretches to make the stop,one of two things can happen.He can make the save and for the time being become the keeper with the highest save to shot ratio in the league or he can concede the goal a fall to rock bottom.

If we are using save to shot ratios to rank keepers then neither evaluation of our one shot wonder is likely to be accurate.One is overly generous,the second unfairly dismissive.Lack of opportunity is the main culprit for our indecision and the usual solution is to exclude players who have failed to reach a minimum number of shots faced.However,even a seemingly large number of shots faced are going to contain varying degrees of random chance,so rather than discriminate against those small sample sized wannabees,lets try to use every piece of goalkeeping data at our disposal.

If our hero makes the save,it shows a level of competence and a level of luck.He's lucky he didn't slip as he prepared to dive,that he wasn't unsighted,the ball wasn't wickedly deflected or the shot wasn't so good it was unsaveable.If we had no prior knowledge of his ability before this save,the best guess we could make would have been that he was most likely to be an average EPL stopper.Add in the single save and we can elevate him slightly above average.As our keeper faces more shots on target his opportunity to impress or depress increases and we can start to develop and opinion that becomes increasingly defined by his skill rather than his good or bad fortune.

The save to shot ratio of individual EPL keepers rarely drops below 60% or breaches 80% and more commonly is around 70%.We can use a keepers successful shot saving efforts and his unsuccessful goal allowing failures to determine how likely he is to be a member of each particular shot saving group of keepers.Is he really a 60% stopper whose present 70% saving rate owes a lot to randomness and limited sample size or is he an 80% who's current statistics are being underwhelmed by bad luck.The less information we have,the more we have to drag those raw percentages back towards the average of all keepers in the group.In short our 100% success or failure,one shot guy gets a lot of benefit of the doubt if he allows the goal and a healthy dose of "wait and see" scepticism if he pulls off the stop.And what goes for him also applies in steadily declining degrees to his more tested colleagues.

Below I've taken virtually every keeper who has pulled on a glove in anger this season and corrected his raw save to shots ratio to account for the amount of action he has seen.The less shots faced the more his numbers have been dragged either upwards or downwards towards the average for all keepers over the last couple of completed seasons.I've also added a final column to show which keepers have been subjected to the biggest correction by virtue of their varying shot stopping sample sizes compared to their raw numbers.

Chelsea's Hilario is near to the top of most conventional lists because he's saved 80% of shots on target.However,he has only faced 10 such efforts,but rather than exclude him from the list we have taken those 10 shots and credited him with a spot just above the league's average.Only time and opportunity will move him and improve our estimation of his true talent.For now he's above Petr Cech,but not as overwhelmingly so as he was if we had looked solely at each keeper's current save to shot ratio.

Shot Adjusted Save Percentage for 2011/12 to the end of January.

Keeper. Team. Shots Faced. Save
Percentage.
Adjusted
Save %.
Ranking
Change.
Hart. Man City. 80 77.5 74.6 +2
Mignolet. Sunderland. 57 77.2 73.6 +2
Friedel. Spurs. 101 76.2 73.5 +2
de Gea. ManU. 73 76.7 73.4 0
Lindergaard. ManU. 22 81.8 73.4 -2
Hennessey. Wolves. 151 73.5 72.4 +5
Stockdale. Fulham. 53 75.4 72.3 0
Vorm. Swansea. 107 74.8 72.0 +1
Schwartzer. Fulham. 71 74.7 71.9 +1
Cerny. QPR. 28 75.0 71.5 -2
Hilario. Chelsea. 10 80.0 71.4 -9
Reina. Liverpool. 77 72.7 71.0 0
Sorensen. Stoke. 51 72.6 70.7 0
Guzan. AVilla. 28 71.4 702 0
Given. AVilla. 71 70.4 69.9 -1
Ruddy. Norwich. 120 70.0 69.9 +1
Foster. WBA. 96 67.7 68.8 0
Westwood. Sunderland. 33 66.7 68.5 0
Bunn. Blackburn. 14 64.3 68.4 +3
Howard. Everton. 76 65.8 67.9 0
Krull. Newcastle. 88 65.9 67.7 -2
Cech. Chelsea. 66 65.2 67.6 -1
Begovic. Stoke. 53 64.2 67.1 0
Bogdan. Bolton. 24 58.2 65.9 +3
al Habsi. Wigan. 123 63.4 65.4 -1
Jaaskelainen. Bolton. 100 63.0 65.3 -1
Szczesny. Arsenal. 84 60.7 63.6 -1
Kenny QPR. 71 57.8 63.2 0
Robinson. Blackburn. 88 54.6 60.1 0

Joe Hart,comfortingly for England fans grabs the top spot with an adjusted figure that eclipses both Hilario and Lindergaard,who scored better raw save to shot ratios,but were out tested by the Manchester City stopper by factors of 8 and 4 to 1 respectively.

Wolves' Wayne Hennessey has faced 150+ shots on target,the most number so far this season,but still not sufficient to take his ratio completely at face value and so he can't escape the gravitational pull of the league average.Only Norwich's Ruddy barely moves,but that's simply because he is so close to the mean that he has nowhere really to move to.Blackburn would appear to have little to lose by replacing Paul Robinson as his corrected figures barely creep above 60% a line below which few keepers should expect to survive below for very long.

A couple of team mates keep each other company in the revised standings,implying that there is also a team component to the quality of shots a keeper faces.But there is also enough variation between stoppers from the same club,notably in the case of Blackburn,QPR,Sunderland and Stoke to equally imply that goal keeping is a skill that varies between team mates.

So this is a start to tease out the randomness that inhabits all situations and is a bane to all keepers.They more than any position are hostages to short sequences of luck based results where the faulty application of causality brands them either world beaters or dross,when the truth is almost always somewhere in between.

Wednesday, 8 February 2012

How You Know Who's "Winning" Before a Goal's Scored.

Whoever devised the scoring system for football was an undoubted genius.It's magnificently simple,kick or head the ball between the posts,underneath the crossbar and over the line in a legitimate fashion and your goal tally advances by exactly one score.There aren't differing degrees of attacking attacking success,such as three points for a field goal,six,seven or eight for a touchdown and two for a safety as in American football or five points (or maybe four depending on the era) for a try,three for a penalty and two for a conversion as in rugby.So unlike it's near cousins the ultimate objective in football is always the same,namely to try to score a goal,corners or rattled crossbars don't count on the final score card..

The benefits of football's goalcentric approach is evident in other aspects of the game.Mathematically modelling a soccer game is easier and therefore more likely to reflect the actual course of games over time than either the NFL or rugby where the convoluted scoring regime often means you have to fall back on the unsatisfactory ploy of averaging historical,none team specific play by play data.

Secondly,in football the "in game" situation is obvious to even the casual observer.A goal down and you're one score from being level or two consecutive scores from being in the lead,whereas a gridiron side down by nine can be anything from two scores to five scores away from the edging ahead.

A third feature of scoring in football is that it is a comparatively rare event when viewed beside the other sports mentioned.Around two and a half goals per game is the average benchmark figure for most of the major European football leagues and that's about a quarter of the scoring events enjoyed by followers of the various oval ball games.This scarcity of scoring certainly adds to the tension of the sport and scoring events are very likely to cause large swings in in game win probabilities for both sides.The downside from an analytical point of view is that until goals are actually scored we have to rely on the pre game estimation of each teams strengths to gain an insight into how the balance of play lies.These type of estimations are remarkably accurate over the longterm,but understandably slightly challenging to compute on the fly whilst you are actually at the ground watching the match.Therefore,I've tried to come up with a less taxing and visually obvious way to predict the outcome of a game in real time.

Everyone's only too familiar with the incisive comment and analysis originating from the colour man in the commentary box during a live game.But how significant is it for example if one team is putting in lots of tackles,winning the majority of the aerial duels or enjoying lots of possession.The slight overload of statistics is a welcome addition to football coverage,but which set of "in game" numbers are the really important ones that correlate well with winning.We saw in this post that there is a reasonable positive correlation between the percentage of ground duels a team wins over the course of a season and their subsequent success rate for that season.So I decided to see if that relationship held for individual games and therefore might be useful as a valid indicator of likely success in a game before the scoring had started.

Ground duels are defined as 50-50 contests between opposing players in which one players emerges successful.For every winner of such a contest there is a corresponding loser and potentially these duels could result in the defeated player's team being forced to commit extra players to cover the victorious opponent,leading to a loss of team shape especially if the duel takes place in their half of the field.In short duels are the kind of precursor you would expect prior to an attacking threat being launched.I looked again at data from the 2008/09 season and duels on average occurred just over once a minute for the duration of the game.The Sunderland Stoke game and the West Brom Stoke contest tied for the lowest total duels (70) and the Tottenham Arsenal game had the most at 174,so they're about four times more common than shots on goal and 40 times more common than goals.

Keeping track of the Ground Duels can tell you which team has the upper hand.
Using the number of ground duels won and lost by the home teams from all 380 games from the EPL 2008/09 season as the independent variables and whether or not the game ended in a home win as the dependent variable I then ran a regression and found that there was a strong,statistically significant correlation between the variables.The bigger the positive differential between duels won and duels lost by the home side,the more likely they were to have secured a home win.To avoid the problems associated with plotting graphs with multiple variables and dichotomous outcomes,I've plotted the line of best fit that relates the duel differential to the chances of the game ending in a home win.

 Line of best fit for relationship between home wins and ground duel superiority or deficiency of home teams.


























The plot appears to be sensible in that a home team which improves it's ground duel differential also improves it's chances of winning.A home team who outfights it's visiting opponents by 30 successful challenges went on to win the game in over 65% of the time based on the 2008/09 data.Of course we need to point out that we are dealing with longterm outcomes here,35% of such games over time would likely see the home side have this level of duel superiority and they would fail to win the game.So I've also run similar regressions with "lose/no lose" and "draw/no draw" as the dichotomous outcomes,so we can gauge the rate at which home teams lose given certain levels of duel differential.

 Line of best fit for relationship between home defeats and ground duel superiority or deficiency of home teams.


























Again I must emphasis I am merely plotting the line of best fit from the regression generated equation,but once again the relationship between losing a game and how the home team fared in 50/50 ground duels in that game is strong,statistically significant and in the expected direction of correlation.Poorer duel performance leads to more home defeats.So finally if we plot for draws.

Line of best fit for relationship between home draws and ground duel superiority or deficiency of home teams.


























It's common knowledge that draws are more random in outcome than wins or losses.Very good teams consistently produce lots of wins and this correlates from season to season,but not so draws.Good teams can draw lots of games or they can drew very few,likewise poor or mediocre outfits.Similarly,season on season team correlation for draws is relatively poor.Therefore it's unsurprising that the level of statistical significance isn't as high as was the case for wins or losses.The plot is slightly more interesting because a maximum is reached around where the duel outcome is fairly evenly split and again this is to be expected because mismatches at the extremes tend to produces more goals,hence less draws.

We can hopefully see how duelling players can help us predict the course a game may take.Tackles won by teams don't correlate well with winning,although this may be a problem with how tackles are recorded or how some teams chose to persuade their opponents to hand over possession,shots do correlate with winning but are a lot less common than duels and goals take on average 30+ minutes for the first one to arrive.

But if you have a feel for which team is winning it's duels in say the opening quarter,you can extrapolate towards a final game figure and compare that with the typical win probability for the league.A plus 2 differential after 15 minute will not always shake out as a +12 differential come full time and a slightly greater than 50% chance of winning the match because games naturally ebb and flow.But at the very least you will be taking notice of the more important "in game" events.It should also enable you to be reasonably accurate in your assessment of whether or not a team has conceded or scored against the run of play.

These kind of results can ultimately open the door for more informative player ratings.If we know which statistics correlate to winning we can start to properly weight each component that goes towards the overall player rankings that are starting to appear.Many of them are understandably "black box" figures.Also a players "win contribution" to his team starts to become a realistic proposition because by replacing a particular player's win correlating statistics within the team with an average value derived from all the players playing his position it should allow us to isolate who contributes what to team wins.

Sunday, 5 February 2012

Chelsea v Manchester United.

Chelsea 3 Manchester United 3.


On a weekend where some of the biggest matchday drama came during the journey back from the game(a 20 minute return trip from Stoke took 4 hours because of the snow),title contenders and title has-beens Manchester United and Chelsea respectively served up a cracking six goal thriller.

 Scorers:
1-0,Evans(og),36'
2-0,Mata,46'
3-0,Luiz,51'
3-1,Rooney(pen),58'
3-2,Rooney,(pen),69'
3-3,Hernandez,84'

The stalemate in an evenly balanced contest was broken when Evans made a hash of clearing  Sturridge's dangerous cross ball.United's chances looked remote when first Mata,with a stunning volley and Luis with a deflected header gave the Blue a seemingly unassailable 3-0 lead with under half the game left.The fightback started half an hour from time when Rooney converted a soft penalty and followed up with another just 10 minutes later.Hernandez's 84th minute levelling header put the game back up for grabs as Chelsea saw a large chunk of their mid game Expected Points superiority drain away.Ultimately 40 minutes of dramatic twists and turns dumped the game back to pretty much where it had started and a match between to evenly matched sides ended all square.

From a statistical viewpoint you could chose a host of angles to model.How likely were United to be 3-0 down after an hour (1 in 55) or how likely were Chelsea to give up a 3-0 lead with half an hour left (1 in 105) ,to name but two.However,we'll concentrate on the United's two successfully converted penalties.Coupled with the brace of spot kicks they scored in midweek at home to Stoke,ManU had been awarded and scored from four consecutive penalties.Hernandez and Berbatov scored at Old Trafford on Tuesday and Rooney stepped up to the plate both times at Stamford Bridge.

Berbatov,one of United's 3 penalty scorers in the last 6 days.


So how likely was the Red Devils' 100% conversion rate,if we assume that their diverse cast of spot kickers have the typical 75% success rate from 12 yards.There are 16 possible combinations of successful or unsuccessful kick spread over four attempts.Obviously only one of those combinations will see all four kicks result in goals.There are four combinations that could have given rise to one miss and three goals,likewise four sequences can result in one goal and three misses.Six combinations of four kicks can result in a 50% conversion rate.If we now incorporate the different generic probabilities for each success or failure we can calculate the probability of each of the five possible outcomes of the sequence of the four Manchester United penalties.

The Probability of United scoring or missing some of their last four spot kicks. 

Outcomes
of Four Penalties.
Probability.
Four Goals. 0.316
Three Goals
+ One Miss.
0.422
Two Goals +
Two Misses.
0.211
One Goal +
Three Misses.
0.047
Four Misses. 0.004


Four successes was the second most likely outcome from a run of four spot kicks,it would occur about once every three sequences.Missing all four kicks,which would have been nice for both Chelsea and Stoke would happen about once every 250 four penalty runs.

Peter Crouch.A Career Projection.

The continued ability of the English Premier league to attract talent has proved to be a double edged sword for some of the second tier members of England's top tier.Whilst on the one hand any precocious talent who bursts onto the world stage at a top international tournament can be expected to land in England and torment defences sooner rather than later,there is also the opportunity for lesser teams to purchase talented,but ageing stars who suddenly find themselves surplus to requirements.The likes of Fowler,Wright and Owen turning out for Manchester City,West Ham United and Newcastle respectively was just a precursor for the likes of Stoke,Wigan and Birmingham to be able to attract proven goalscorers to their less fashionable homes.

Predicting an ageing pattern for sportsmen as a group is probably one of the most problematic conundrums possible.What seems like a deceptively easy task is actually one that is riddled with pitfalls and selective bias.For example if you want to see how many games on average a 29 year old plays over a season,you can not simply take the record of all players of that age and average the total.The group of 29 year olds is actually a selectively biased sample of players who actually played games during those years.Some 28 year olds may not have played at all during their 29th year,either because they weren't good enough to command a start or perhaps because they were long term injured.So if you are a manager eager to snap up a bargain older player,simply looking at how 29 year olds performed on average will almost always give you an inflated and over optimistic view of how your purchase might perform.

It also seems likely that player position can be an influence on a player's expected decline.Goalkeepers especially managing to achieve to acceptable levels of performance much later in life than the more physical roles played by outfielders.

Of particular interest to Stoke City fans is the question of how the record signing of Peter Crouch will pan out over the anticipated four year duration of his contract.Crouch is a member of a select band of strikers who have scored 100 EPL goals,a testament to both longevity and scoring prowess.So I propose to see how the career profile taken by this select group of players has ebbed and flowed,paying particular attention to the likely amount of games these players can play as they age,how their goals per game ratio changes and how Crouch's career numbers fit into these projections.

Peter Crouch,a member of the 100 EPL goals club.

Around 20 players have reached the 100 EPL goals milestone and around half of these players have effectively retired from actively playing top flight football.If we begin by looking at just these retired players,we only have to account for players who are absent from the sample due to dropping out of the top flight and not those who are absent from the older age bands because they are too young.Also as most of the players in this group played the bulk of their games in the early years of the Premiership it might allow us to compare the state of the EPL now to how it was then,for top notch strikers at least.

If we start with appearances.Our sample makeup is biased because we are looking at how players who we know were destined to score at least 100 EPL goals fared in their early years.We hope to mitigate that sample bias by looking at a very narrow band of conclusions,namely how 100 goal players age.We are simply trying to second guess the viability of purchasing proven goalscorers who are in all probability past their best. Therefore,in this initial group of retired players we simply have to account for the players that weren't in the early years and those that drop out of the sample in the later ones.I'll do that by assuming that all players are always present or remain in the group,but if they are no longer playing in the EPL then they contribute zero games for each age they are absent.This process is artificial in the early years because from the limited data we don't know which players will achieve great scoring feats.

These problems also exist in the group of players who are still active in the EPL,but additionally younger players,such as Rooney are not present in the older age groups for the obvious reason.All I propose to do at the moment is to exclude the younger players from the sample beyond their current age and hope that the current batch of older players are reasonably typical of those who will follow.....I did say ageing studies were problematical.

The two subgroups of goalscorers contain around 10 players each,so for an age related season where each striker could play a maximum of 38 games there are 380 potential games available for that age of generational player.If the cumulative total of games played by all the eligible players was say 280,then I've plotted that age band as taking part in 74% of the total available games.

Percentage of Playing Time Seen by Players with 100 or more EPL Goals.





 By plotting the average percentage of total available playing time actually played by each group,we can see that current strikers were either asked to or were able to play a higher proportion of league games compared to their earlier counterparts.The sample sizes are small due to the restrictive parameters we've used and the methodology becomes more ad hoc as we see the likes of Rooney dropping out of the batch of current players.But there seems a definite trend for the current top class striker to play more often,younger and this generational gap then starts to narrow at around 28 years of age.Whether current scorers longevity will then dip below the trendline for the previous generation can only be speculated upon,but the current player may well be seen more often earlier and slightly less so later.

Today's 100 goal players,it could be argued are more multi dimensional than previous out and out goalscorers,but a manager seeking to grab an expensive addition to his strike force today will still be expecting goals to be part of the package.We've seen in previous posts that scoring rates can be deceptive,20 goals from 30 games is the same scoring rate as 2 goals from 3 games,but as a seasonal output the former is much preferable to the latter.So when we move on to goal output I've calculated a seasonal rate instead of a rate tied just to actual games played.Potentially players can play 38 EPL league games,so I've used that figure whether they played all 38 or not.When deciding to purchase a player one of the overriding concerns should be how many goals will he add to my teams seasonal total and this approach measures how the pool of players from which we are making our purchase have performed in this respect over time.

Seasonal Goal Rates for EPL Players with 100 or more Goals.


We saw in the appearances graph that the later generation of EPL scorers were seen more frequently,earlier in their careers than had been the case for the earlier generation.So it's not surprising to see the former out scoring their elders,albeit with the gap closing.The current crop of strikers do appear to then be outscored on a seasonal basis by the old guard,however the group does not include any numbers that Rooney may put up post his 27th birthday.So while the trend should be noted,the conclusions are far from being set in stone,Wayne will probably push the green line much closer to the red one once his more mature years are added to the mix..

So how do these graphs relate to what we know about Peter Crouch.


























In terms of playing time he appears to be a slight throwback to an earlier generation.Mainly through lack of opportunity he was only playing around 40% of the available top flight games he could have played in his early twenties.So when the currently playing crop of  goalscorers are starting to see their playing time decrease at around about 26,Crouch was actually still on an upward curve and he only started to see opportunities decline as he passed 28.Encouragingly he was still capable of playing almost 80% of the available EPL league games when he reached 30.The benchmark figure for top strikers in their 30's is around 75% of league appearances,so Crouch is slightly ahead of the ageing curve but not freakishly so.By 34 when Crouch's current contract expires a typical prolific and long lasting goalscorer was on average still appearing in just over half of his team's league games and,barring catastrophic injury Crouch would expect to do at least as well as this.So 30 plus games in the first year of his contract,dropping to the low 20's by the end seems a realistic estimate for Crouch to achieve,but we really want to know how many goals he'll contribute to Stoke if they continue to invest throughout the side and stay a Premiership side.


This final graph is Crouch seasonal totals expressed in terms of goals per maximum possible number of league games that he could have actually played (this convoluted approach is necessary to account for the occasional 42 game season).A single player sample is going to be more noisy than a larger sample size,but Crouch's goals per 38 game season has been between 0.2 and 0.25 goals per game for all of his later years.These numbers make him a legitimate member of either the Shearer led older group or the Wayneless present collective.

The best case scenario for Stoke fans is that their £11 million record signing has in his dotage the resilience and scoring rate of the earlier group,coupled with the added extras that enables the more recent group to keep being selected despite an apparent decline in scoring ability.In raw goalscoring terms he could reasonably expect to be scoring 12 goals in his first season,falling to around 8 in his final year.Perfectly acceptable returns for a team whose top scorer rarely breaches double figures.