Wednesday, 8 February 2012

How You Know Who's "Winning" Before a Goal's Scored.

Whoever devised the scoring system for football was an undoubted genius.It's magnificently simple,kick or head the ball between the posts,underneath the crossbar and over the line in a legitimate fashion and your goal tally advances by exactly one score.There aren't differing degrees of attacking attacking success,such as three points for a field goal,six,seven or eight for a touchdown and two for a safety as in American football or five points (or maybe four depending on the era) for a try,three for a penalty and two for a conversion as in rugby.So unlike it's near cousins the ultimate objective in football is always the same,namely to try to score a goal,corners or rattled crossbars don't count on the final score card..

The benefits of football's goalcentric approach is evident in other aspects of the game.Mathematically modelling a soccer game is easier and therefore more likely to reflect the actual course of games over time than either the NFL or rugby where the convoluted scoring regime often means you have to fall back on the unsatisfactory ploy of averaging historical,none team specific play by play data.

Secondly,in football the "in game" situation is obvious to even the casual observer.A goal down and you're one score from being level or two consecutive scores from being in the lead,whereas a gridiron side down by nine can be anything from two scores to five scores away from the edging ahead.

A third feature of scoring in football is that it is a comparatively rare event when viewed beside the other sports mentioned.Around two and a half goals per game is the average benchmark figure for most of the major European football leagues and that's about a quarter of the scoring events enjoyed by followers of the various oval ball games.This scarcity of scoring certainly adds to the tension of the sport and scoring events are very likely to cause large swings in in game win probabilities for both sides.The downside from an analytical point of view is that until goals are actually scored we have to rely on the pre game estimation of each teams strengths to gain an insight into how the balance of play lies.These type of estimations are remarkably accurate over the longterm,but understandably slightly challenging to compute on the fly whilst you are actually at the ground watching the match.Therefore,I've tried to come up with a less taxing and visually obvious way to predict the outcome of a game in real time.

Everyone's only too familiar with the incisive comment and analysis originating from the colour man in the commentary box during a live game.But how significant is it for example if one team is putting in lots of tackles,winning the majority of the aerial duels or enjoying lots of possession.The slight overload of statistics is a welcome addition to football coverage,but which set of "in game" numbers are the really important ones that correlate well with winning.We saw in this post that there is a reasonable positive correlation between the percentage of ground duels a team wins over the course of a season and their subsequent success rate for that season.So I decided to see if that relationship held for individual games and therefore might be useful as a valid indicator of likely success in a game before the scoring had started.

Ground duels are defined as 50-50 contests between opposing players in which one players emerges successful.For every winner of such a contest there is a corresponding loser and potentially these duels could result in the defeated player's team being forced to commit extra players to cover the victorious opponent,leading to a loss of team shape especially if the duel takes place in their half of the field.In short duels are the kind of precursor you would expect prior to an attacking threat being launched.I looked again at data from the 2008/09 season and duels on average occurred just over once a minute for the duration of the game.The Sunderland Stoke game and the West Brom Stoke contest tied for the lowest total duels (70) and the Tottenham Arsenal game had the most at 174,so they're about four times more common than shots on goal and 40 times more common than goals.

Keeping track of the Ground Duels can tell you which team has the upper hand.
Using the number of ground duels won and lost by the home teams from all 380 games from the EPL 2008/09 season as the independent variables and whether or not the game ended in a home win as the dependent variable I then ran a regression and found that there was a strong,statistically significant correlation between the variables.The bigger the positive differential between duels won and duels lost by the home side,the more likely they were to have secured a home win.To avoid the problems associated with plotting graphs with multiple variables and dichotomous outcomes,I've plotted the line of best fit that relates the duel differential to the chances of the game ending in a home win.

 Line of best fit for relationship between home wins and ground duel superiority or deficiency of home teams.

The plot appears to be sensible in that a home team which improves it's ground duel differential also improves it's chances of winning.A home team who outfights it's visiting opponents by 30 successful challenges went on to win the game in over 65% of the time based on the 2008/09 data.Of course we need to point out that we are dealing with longterm outcomes here,35% of such games over time would likely see the home side have this level of duel superiority and they would fail to win the game.So I've also run similar regressions with "lose/no lose" and "draw/no draw" as the dichotomous outcomes,so we can gauge the rate at which home teams lose given certain levels of duel differential.

 Line of best fit for relationship between home defeats and ground duel superiority or deficiency of home teams.

Again I must emphasis I am merely plotting the line of best fit from the regression generated equation,but once again the relationship between losing a game and how the home team fared in 50/50 ground duels in that game is strong,statistically significant and in the expected direction of correlation.Poorer duel performance leads to more home defeats.So finally if we plot for draws.

Line of best fit for relationship between home draws and ground duel superiority or deficiency of home teams.

It's common knowledge that draws are more random in outcome than wins or losses.Very good teams consistently produce lots of wins and this correlates from season to season,but not so draws.Good teams can draw lots of games or they can drew very few,likewise poor or mediocre outfits.Similarly,season on season team correlation for draws is relatively poor.Therefore it's unsurprising that the level of statistical significance isn't as high as was the case for wins or losses.The plot is slightly more interesting because a maximum is reached around where the duel outcome is fairly evenly split and again this is to be expected because mismatches at the extremes tend to produces more goals,hence less draws.

We can hopefully see how duelling players can help us predict the course a game may take.Tackles won by teams don't correlate well with winning,although this may be a problem with how tackles are recorded or how some teams chose to persuade their opponents to hand over possession,shots do correlate with winning but are a lot less common than duels and goals take on average 30+ minutes for the first one to arrive.

But if you have a feel for which team is winning it's duels in say the opening quarter,you can extrapolate towards a final game figure and compare that with the typical win probability for the league.A plus 2 differential after 15 minute will not always shake out as a +12 differential come full time and a slightly greater than 50% chance of winning the match because games naturally ebb and flow.But at the very least you will be taking notice of the more important "in game" events.It should also enable you to be reasonably accurate in your assessment of whether or not a team has conceded or scored against the run of play.

These kind of results can ultimately open the door for more informative player ratings.If we know which statistics correlate to winning we can start to properly weight each component that goes towards the overall player rankings that are starting to appear.Many of them are understandably "black box" figures.Also a players "win contribution" to his team starts to become a realistic proposition because by replacing a particular player's win correlating statistics within the team with an average value derived from all the players playing his position it should allow us to isolate who contributes what to team wins.

No comments:

Post a Comment