Friday 30 March 2012

Predicting the Game's First Goal Scorer.

The importance and value of scoring the game's first goal has been well documented on this blog and elsewhere.Omar has written about the subject here and I have discussed the subject here and used illustrations from previous seasons to highlight how a team's final results are affected by the frequency with which they score first.If a team finds the net before it's opponent then they can begin to dictate the immediate course of the game.

In matches where the gap in quality between opponents is small,the first strike can greatly shift the balance of expectations.If we consider the case of two evenly matched sides then an opening goal from the the home team after half an hours play will shift the Expected Points total for that side from around 1.6 points to 2.3,a jump of 0.7 of a point.At the first whistle,the home side would have typically had about a 44% chance of ending the day with win bonuses all round,but by keeping out the visitors for 30 minutes before scoring themselves,they have increased the chance of a positive outcome to over 70%.Their visitors could have expected to average a tick over 1 goal per such encounter.But by failing to take their chances so far they would expect to average only 0.8 of a goal during the remainder of the game by the 30 minute mark and they trail by a goal as well.The afternoon is looking like it is turning into one of the more positive outcomes for the home side from the myriad of possible game scenarios.If we reverse the scoring and allow the visitors first blood,the importance remains.Their pre game win probability rises from 30% to 60% and their Expected Points rise from 1.15 to 2.05 points.

Even if we model the most extreme of mis matches that occur in the current EPL we see similar results.A relegation candidate taking an half hour lead at a title contender's home turf will see their win probability rise from 4% to 26% meaning the most likely result would still be a home win,but it's a massively improved situation compared to pre game.And the inferior team are now more likely than not to return with at least a point.

Scoring the first goal is therefore a good indicator of success on the day and if you look at the teams who have scored the highest proportion of first game goals so far they include most of this year's best teams.Both Manchester sides top the list,followed by Spurs,Chelsea,Newcastle and Arsenal.League stragglers such as Wigan,Wolves,Bolton,QPR and Fulham prop up the first scorer table.Therefore,it seems reasonable to assume that the typical rate at which teams register the first goal is related to their overall scoring record.You can demonstrate this relationship by plotting the proportion of match goals a team has scored over the season against the proportion of first goals each team has scored.

Seasonal Relationship Between Goals Scored and 1st Goals for EPL Teams 2009-11.

A less general method involving a match by match basis,where the respective pre match goal expectancies are compared to the identity of the game's first goal scorer also yields a similar correlation.The line of best fit for two completed season's worth of EPL games also strongly indicates the near one to one relationship between the two.

Line of Best Fit for Proportion of Goals and First Goals Based on a Match by Match Samples.

From a team perspective we are now in a position to predict the likely number of first goals a team of a given scoring and conceding ratio will record.We know that scoring first gives that team a much higher likelihood of winning the game than was the case at the start of the contest.In some cases a team will substantially exceeded or dip below those opening goal expectations and unless we can find compelling reasons for these first goal deviations,the most likely explanation is that the pattern has arisen because of small sample random chance.

So far this season Newcastle are the biggest overachievers in this category.They have scored the first goal in almost 70% of matches,yet they have scored barely over 50% of their game's goals.Their current position is a magnificent achievement,but some of it has probably been achieved through a fortuitous run of opening goals.At the other end of the table,Wolves with only 21% of opening goals despite scoring 32% of their game's goals can count themselves slightly unfortunate at the depth of their current plight.

We can move this kind of analysis a stage further by  looking at the level of individual players.Again the league's leading scorers tend to also top the first goal lists,indicating that a similar relationship exists at the individual level,namely the larger the proportion of goals you score then the more likely your are to notch an opening strike.There's a couple of tweaks needed before the player based data can be used.Firstly,players unlike teams don't play every game,so they shouldn't be penalised for occasions when they are absent and similarly they may enter the match after the first goal has been scored.For example RvP has featured in every EPL Arsenal game,but despite scoring a brace against Stoke he didn't leave the bench until the hour mark,by which time the score was already 1-1.So that game has to be discarded from the sample.

Secondly,position affords strikers more of an opportunity to open the scoring than say defenders.The average time of the first goal in the EPL is just after the 30th minute and this early in a contest defenders will still be prioritising defending.In a quick analysis consisting of the leading scorers from the last couple of years,leading strikers scored around 19% of their teams match goals,but accounted for over 21% of opening goals in those games.A line of best fit therefore can be plotted and used to predict the number of opening goals predominately attack minded players could be expected to score if we know or can estimate their likely scoring rate.

The Rate at which Attacking Players Opened the Scoring in 2011-12 in the EPL.

Player. Goals Scored
by Team.
Allowed by Team.
Goals Scored
by Player.
Number of
Games  by Player
Number of
1st Goals.
Number of 1st Goals.
R van Persie 56 38 24 27 8 7.5
W Rooney 60 20 21 24 6 6.8
E Dzeko 35 12 12 13 4 3.6
S Aguero 63 25 15 19 6 3.8
C Dempsey 37 58 12 29 4 4.5
D Ba 33 35 16 23 5 5.9
Yakubu 32 40 13 19 4 3.9
G Bale 50 31 10 26 5 3.9
F Lampard 33 22 10 21 3 4.0
P Crouch 27 36 8 24 2 3.6

For the majority of the goal scorers listed their actual rate of opening goals is very close to their predicted levels.Aguero's tally of 6 is around 30% higher than you would expect from his overall scoring record and the record of Man City when he starts,while Rooney,Ba and Crouch each under perform compared to average expectations.However,we are dealing with very small totals here and any under or over performing trend should be deemed as much more likely to be descriptive as opposed to predictive.

Placing too much credence upon single season numbers may lead to teams and individuals being either over or under rated.A run of randomly superior rates of 1st goal scoring may falsely inflate a team's points total and may not be predictive or indicative of future levels of performance.This may have occurred in the case of Newcastle this year and of Ipswich here.Similarly,opening goals as we have seen are potentially valuable and players who manage a small sample sized glut of such strikes may appear to possess a valuable ability that may or may not exist.Players may be able to control to some degree the amount of goals they score in a season, it certainly is a reasonably repeatable skill across seasons.But the order and thus the importance in which these goals are scored during a game may be much more of a hostage to chance.

1 comment:

  1. Why do the two graphs have different line shapes?

    The first seems mildly exponential (with no explanation as to why it isn't straight line) and the second appears sigmoidal.

    Surely they are governed by the same fundamentals but are simply different time frames so they should have the same line shapes. I'd suggest that the first one should be sigmoidal but the noise in the system would only allow a broad fit.