Friday, 5 October 2012

The Case For Data Analysis In Football.

One persistent criticism that has been aimed at football analytics is that it hasn't overturned any existing notions that have formed around the modern game in the same way that the sabermetrics movement challenged the status quo within baseball.

I do not agree with this assertion.

Before we can address this important point, it would be helpful to give a quick (and almost certainly flawed) overview of the statistical revolution that occurred in baseball. Much has been written concerning the major differences between baseball and football. The former has discrete well defined events whereas football is a true "team on team" event where interactions are both complex and numerous. So the challenges in football are different to those found in baseball.

Secondly the timescale and resources available to either sport has been vastly different. Advanced baseball analysis was probably kickstarted in the late 70's with the formal self publication of the thoughts of Bill James, who sought to reinterpret statistical measures that had been around since what is now the last century. Ideas were then developed with the extensive self collection of data, a massive project that required huge amounts of cooperation from enthusiasts on a scale that is all but impossible to replicate within football. The gap between the birth of this fledgling movement and any acknowledged impact on the sport itself was then upwards of twenty years when "Brad Pitt" introduced these "new" ideas to MLB.

By contrast football analytics has had neither the luxury of accumulating large amounts of data, which makes the MCFC and Opta initiative so welcome, nor data laden decades in which to mature, nor so many ancient and flawed targets to demolish.

Many will have their own idea of when football analytics began to evolve, but much of the context setting work started to appear in print and on the internet in the late 90's, mainly based simply on goals scored and allowed and congregating around the many fledgling gambling sites that were and still are so prevalent. Everything from team specific win expectancy, likely final scores and expected times for a first goal to be scored were modeled using two meager team statistics. The importance of different goal scoring environments were recognised and acted upon.

So just as baseball had used models and fundamental data to describe the run and win expectancy of any game in any game state, football amateurs have already done the same for their sport, albeit on sites that are an internet backwater to many.

Football has also used this route to overturn many (journalistic) cliches that persist around the sport. The cup isn't a great leveller, it's the preserve of the Premiership particularly the Big Four or Five. It isn't harder to play against ten men, it's considerably easier. 2-0 isn't the most dangerous lead in football, it's preferable to 1-0 but not as good as 3-0. A team shooting first in a penalty shootout doesn't automatically inherit a 60% chance of winning. And more recently, raw possession isn't as important as what you actually do with it and Swansea aren't Barcelona, Britton isn't Xavi and only this week, West Ham aren't Real Madrid.

To overturn a nonsense, you first need that nonsense to exist.

Progress then stalled through lack of meaningful data, until the very recent introduction of various pay sites, resulting in a rapid familiarity with such areas of the field as "the final third". If the MCFC data dump, which in it's advanced form comprises less than 0.3% of one season's worth of games and therefore contains an even smaller amount of one year's total data, has merely confirmed perceived wisdom as of 2012, isn't that something to celebrate rather than lament.

Sabermetrics, in the view of it's supporters overturned perceived wisdom because the old time scouts got it wrong. It is hugely encouraging, but not totally unexpected to realize that the present day "traditional" football analysts, armed with superior tools and a generation or three removed from analysts in another sport have largely interpreted on field events correctly. And if number crunching can add value and quantify those conclusions, then that's surely even better for everyone involved or even mildly interested in the subject. Collaboration is always preferable to wars and perhaps there isn't a baseball like war to be fought in football. (Rather appropriately baseball is currently embroiled over which flavour of WAR to use).

The fledgling analytics movement within the NFL is probably a much more appropriate field with which to compare football's attempted leap forward. Less developed than baseball, it still has advantages over football (soccer) in terms of simplicity of on field events and access to copious amounts of data. But it's success stories are largely the same as those enjoyed by soccer. NFL number crunchers have helped to sort out the correlation and causation conflict between running the football and winning, they exposed tactical inefficiencies in fourth down decision making and they cleaned up their own self inflicted nonsenses such as the "curse" of running back overuse. They've suggested ways to project college quarterback statistics into the NFL and quantified on field events that are predictive of future wins. It apparently helps if you have a quarterback who can throw the ball........but how much does it help?

In terms of analytical progress, the two sports are neck and neck (although soccer is better placed because of it's global appeal) and with much more data input from interested parties and more than a few false starts, both should progress rapidly in the future. Although those caveats shouldn't really need to be constantly repeated.

Football analytics is in a great place at the moment.


  1. So how do you get in to data analytics in football, developing a career in the field?

  2. Hi Anon
    I can only point you in the direction of Rob Carroll's at
    particularly this page

  3. A good piece and something everybody that claims to bet on football should at least give some consideration to

  4. I've been playing, watching, living, breathing football, since I could walk and think, and now work in corporate research and analytics...

    Collectively, football is more developed in terms of intricacies, and more about situational context than any other mainstream sport I can think of...

    I'm far from convinced that data analysis can add anything meaningful, usable to current higher order understanding of the sport...

  5. Hi anon,

    totally agree with everything you say, except for the last line. Soccer/football is data poor compared to other sports, but with initiatives such as the Opta supported Manchester City data release, that situation is changing.

    The challenge is to add context to the raft of new information that is becoming available. Defining the game situation for each team is currently more important than counting the tackles, shots or passes and once we've done the former we can perhaps make more sense out of the latter.

    Football also has it's share of set piece plays which can be analysed in isolation, starting with penalty kicks and moving back through to corners and free kicks from other areas of the pitch.

    Identifying the most dangerous area for corner deliveries and where to stand to be more likely to pick up the second ball is hardly ground breaking, but it's implementation has had an impact at both ends of the EPL over the last couple of seasons.