Saturday, 8 December 2012

The Trouble With Pythagoras.

In this recent post  I looked at how the Pythagorean approach to converting runs scored and conceded in baseball or points totals in American Football can be utilized to give a more representative win and loss record over a season long timescale. A large number of narrow victories can inflate at side's final league record, but much of this success may be down to randomly fluctuating fortunes and there is no guarantee that these will be repeated in the future. A team's scoring record can partly capture such bouts of good or bad fortune and the luck bearing contribution to league points can be identified using the Pythagorean method, once draws and scoring environments have been accounted for.

Simply eyeballing a side's goal difference can also achieve the same aim and Newcastle were the poster side for over achievement last season when they claimed 5th spot with  goal difference of +5. The competitive balance within the current Premiership is relatively fixed from year to year and Newcastle's goal difference would usually have only been good enough for seventh spot, if not lower. They were out of place, probably by a couple of spots. The case of Newcastle has been extensively covered and eight wins by a one goal margin, coupled with three reasonably heavy defeats were the main factors behind their depressed goal difference and elevated finishing position.

Pythagorean expectation captured Newcastle's atypical season, but so did anyone who took a passing interest in the table or results. So how can this cross over from Sabermetrics begin to be used beyond spotting transparent outliers ?

Much of the effort in transferring Pythag to football has revolved around reducing the error associated with predicted final points totals and actual totals in the same campaign. A Premiership season of 38 games is usually sufficient for skill to begin to overwhelm randomness and the best teams invariably rise to the top. Therefore trying to match your improved model of reality with actual reality is a reasonable aim. A team's true worth is often hidden, but the distortion is reduced in sports such as football where skill is a considerable factor. However, care should be taken not to overfit  a Pythag model of reality to the random elements that occur in matches over the season.

Taking data such as goals scored and conceded over a season to create a model and then fitting that model to those very same matches runs the very real risk of forcing your creation to conform to random noise as well as signal. Once let loose on new data any predictive qualities may well be compromised as solid patterns reveal themselves as little more than randomness. Extensive, out of sample testing is much the way to go in attempting to validate a model based conclusion.

A second stumbling block is the aggregation of data. A glut of narrow wins or defeats may show up in a 38 game season's worth of scoring events. But a hefty, often red card assisted defeat can hang heavy over a side's goal difference as a result of the low scoring in football.

Manchester United 8 Arsenal 2 and United 1 City 6, with a couple of red cards had the capacity to play havoc with a carefully tended Pythag bought up in the USA, where individuals are ejected, but teams are often allowed to remain at full strength. As luck would have it both of United's results eventually cancelled each other out, although the Arsenal result hung heavy on each side's goal difference early in the season. Data aggregation has it's benefits, but one, unusually high scoring game can also be smeared over a whole group of games resulting in a distorted representation of what actually occurred.

There was a time in the top flight when 1-0 wins on the back of an impressive defensive display was widely admired and some present day sides still possess the quality of defenders and tactical nous to engineer such results as part of their normal matchday experience. Five 1-0 victories coupled to a 6-1 defeat accrues 15 actual points, a goal for and against tally of 6-6 and a reputation for being fortunate, over achievers. Six 1-1 draws gets a team just 6 points, the same goal tally and an "unlucky" tag. But Pythagoras treats both teams the same and gives them each a "true" expected points total of around 8 for those six games.

Real life examples will rarely be as extreme, but if we know the actual, individual results, we should try to use that information. One way around this Pythagorean "draining the detail from the data" problem is to treat each match individually and then aggregated the expected points. Thus, a team which managed to run up the score in a single match wouldn't be credited with the ability to be equally threatening in front of goal under more competitive conditions.

A 7-0 win would tend towards three expected Pythag points for that match and a narrow 1-0 win would lead to a Pythagorean contribution that was nearer to two league points, acknowledging the range of outcomes that may occur when defending a narrow lead. The attractive concept of downgrading teams succeeding on the back of winning a lot of close matches would be retained, without the season wide points inflation for a side enjoying "one of those days" and winning a game or two by a wide margin.

If the Pythagorean method is to have any use over and above the many similar techniques that already exist for football, it has to be prepared to look at matches on a game by game basis to maximize it's unique selling point, namely the ability to begin to identify some of the randomness that is incorporated into a team's actual record. In my previous post I looked at the predictive power of the Pythagorean league points totals from one season to the next using aggregated scoring data. Repeating the exercise, but on an individual match by match basis and then summing the expected league points, leads to an improved correlation between "true" Pythagorean points totals in season N-1 and a team's actual points haul in season N.

Identifying attributes that contribute to a side's success is an important aim for analysis and one way to test if a model has achieved this aim is to see if it has predictive qualities. Pythag appears to be reasonably predictive of future performance and applying the method to individual matches also opens the way for a predictive Pythag for yet to be played, single matches rather than merely confining it's use to seasonal points totals.

However, it is competing in a crowded, well tested market, where tools already exist to duplicate it's output. There is scope and a requirement for much further development.

Check out Martin Eastwood's Blog for an Excellent Pythag Primer.


  1. Hi, how about adding clean sheet as an environmental factor to trim the bias due to narrow wins? I haven't read anything on that, nor tried that myself though.


  2. The more goals a team scores in a match, the higher its subsequent scoring rate in that match, so a freak 8-2 result isn't so much down to any inherent superiority in the high-scoring team, but rather in-match momentum.

    I think it might be worth studying weighting goals so that any goal after the first one has a reduced value for future predictive purposes.

    For example, if you assign a value of 1 to the first goal you might assign, say, a value of 0.9 to the second, 0.81 to the third, 0.729 to the fourth and so on. Under this, the 8-2 becomes a more reasonable 5.6-1.9.

  3. Correcting the note on US red cards. Players are ejected and the team loses a player just like everyone else. I have seen this myth of replacement players several places and have no idea where it comes from.

  4. Hi Bj.
    when talking about "ejections" where players are replaced, I'm referring to (American) football, rather than (American) soccer.

    cheers Mark.