Correlation and Causation in Football.

When Albert Einstein started work in the Swiss Patent Office his daily mantra was " believe everything is wrong" and that rigorous approach quickly saw him rise to the heady heights of Technical Expert,second class before he took his talents to more demanding fields.A healthy dose of scepticism is a handy asset when you are trying to make sense of the ever increasing raft of statistics that are now available.Not only as a way of challenging pre conceived notions that have gained credence through being repeated often,but rarely validated,but also to guard against new myths merely replacing hoary old ones.One set of numbers can enlighten and inform,but alone will rarely completely describe a system as complex and dynamic as a football team.

Correlation does not imply causation should be the watchword of anyone interested in dipping a toe into the fascinating world of football analysis.As informed football opinion moves from the purely anecdotal to a situation more grounded in measurement and counting,it has to be wary not to fall for the spurious or inverted correlations that have cropped up in other sports and will invariably appear in football.

The number of times an NFL team "takes a knee" has a very strong,positive correlation with the number of games they win.More knees,more wins in short.So does this mean that a coach should send his team out to maximise the number of times his quarterback takes the snap,cradles the ball,sticks a knee on the turf and declines to attempt a play.Intuitively we all know that the answer is that he shouldn't and a passing familiarity with the rules shows us why.Teams invariably take a knee when they're winning in the 4th quarter and inside the two minute warning.The "play" runs out the clock without the necessity of a full on collision between a team whose only interest is to maintain possession and another whose only chance is to rip the ball from the ball carrier.It prevents an already violent sport turning even more so.Rugby union appears to have developed it's own version with a series of half hearted pick and drives from the scrum in the waning minutes of a match.The important point is that it's the winning situation that is causing the kneels and not the kneels that are giving rise to the wins.

That's a fairly obvious case where the direction of causation goes from the winning situation to the on field action,but other examples are more persuasive and seductive for the unwary.If an NFL team runs the ball more often than it's opponent,they are more often the winners and this led to the seemingly reasonable assumption that a team that runs more,wins more.Throwing the ball in the NFL has more associated risk and potentially more reward than running the ball,but the strong correlation between the number of running plays and wins appears to make the ground route the sensible and profitable way to go.The reality is that the current NFL is a pass orientated league and running the ball often will not produce the expected riches in terms of wins because what the correlation is picking up is simply an extension of the "take a knee" described previously.

To explain further,teams on average build up a lead by passing the ball,but then protect that lead and run down the game clock by running the ball.By contrast the losing team has to make ground up quickly when they have the ball so they pass more and run less.The excessive run differential of the winning team tends to appear after they've build up a lead ,it doesn't in general cause the lead in the first place.These two examples should therefore make us cautious when we start to de-construct the building blocks of football.Moving the ball around an American gridiron isn't quite as straight forward as it first appears.So lets move onto football.

Most of the passing analysis in football revolves around passing in certain areas of the field.It's considerably easier to pass the ball along the back four when you are only being closed down by the forward who is unfortunate enough to be playing nearest to the dugout compared to passing in the tighter confines of the final third.Getting bodies behind the ball wasn't a Welsh import to the Potteries circa 2008,it's been a universal tactic in modern times and more "defenders" and less space inevitably makes passing more onerous.Unfortunately from a coaching or team selection viewpoint you can't buy or select  20 successful passes in the final third of the pitch,you can only acquire players.....forwards,midfielders or defenders.So an analysis of passing success by field position can ignore the real life constraints placed on team selection.

To approach the problem of how passing impacts on game outcome from a slightly different perspective,I've split passing performance by position and tried to assess how important it is for a team to have defenders who can pass well compared to say forwards who can.

I've taken data from this current season and calculated successful passes made by designated defenders who have played for each of the 20 EPL clubs as a proportion of all successful passes made by defenders in the season to date.I've repeated this for all designated forwards and converted the proportions to standard scores to allow easier comparison between the to groups.

The correlation for successful passes by defenders is reasonably strong considering we have just looked at this term's EPL matches and the relationship logically sees more successful passes correlating to more team success in terms of wins or draws.

By contrast with defenders,the correlation between team success and successful passes made by a team's forwards is much weaker.Surprisingly,it would seem that if an average team could chose to improve the passing of either it's defenders or it's forwards by the same amount relative to the league,it would be better off in win terms to plump for the defenders.Intuitively,this seems wrong.As a fan watching a close game,I'm much happier seeing an opponent's defender passing the ball as it's usually in a relatively non threatening area,than I am if a nippy forward is trying  his luck.

If these kind of correlations are repeated in larger samples,I think we may be seeing an example of the current score driving the passing statistics rather than vice versa.Teams who are behind will increasingly push forward.Losing shape in the process and while defenders will have more out and out defending to do it is also relatively easy to pick out a pass into the midfield where the opposition are probably more concerned with creating their own chances than disrupting yours.Also leading teams are more prepared to play the possession friendly  passes along the defensive line when scoring is less of a priority.Every "ole" counts in a comfortable 3-0 canter.Defenders in trailing teams will see the reverse.They will be increasingly asked to get the ball forward quicker,so no cheap keep ball for them and passes will be longer and more speculative as time expires.

The net result will be a situational bias similar to the NFL pass/run frequency,where defenders from teams ahead in the game,who are more likely to ultimately win or at lest draw the game will see their successful passes inflate in comparison to their more likely to be beaten counterparts in the trailing side.The game situation seems to be driving the defensive passing stats,to some degree at least.

Forwards would appear to be much more about one or two killer,goal creating passes rather than a steady accumulation of safer options and the much more smeared out correlation plot possibly reflects this.The difficulty of completing passes higher up the field is accounted for by the standardisation of the data,but what about the larger win boost a team appears to get by improving the passing abilities of it's defenders compared to improving the forwards by the same amount.We may speculate that the comparatively smaller benefit in increased team success seen for more successful passing by the forwards compared to defenders could be down to those forwards receiving less team mate support in a still well populated final third when their team is ahead.This sees a decline in their successful completions.

In short,defenders passing numbers may be boosted by the  winning game position,but that winning position may have be caused by the quality if not the quantity of the passes made earlier by the forwards.If those forwards then find passing more difficult because colleagues are more concerned with defence,the forwards may see their pass success rates fall.The danger is then that the flawed connection between having better passing defenders and winning may be made and we start to weave the erroneous,but plausible idea that successful teams build from the back.

That recently signed,cultured centre back who passes well most probably won't increase your number of wins by anywhere near the amount you expect (because the correlation is false) and his completion rate will probably drop as well because you bought from a more successful team than your own.

This post is a cautionary tale.The data is limited and the conclusions may not persist to the same extent in larger samples,but data crunchers should not expect teams to have the same objectives throughout a game and qualities that cause a lead often become lower priorities when defending one.The real question is how do teams take the lead and then how do they keep it and that requires two different datasets.

