Thursday, 31 October 2013

Finishing and Hitting the Target in the MLS.

The first attempt I made at looking beyond the commonly available football stats of the day, namely goals, used shot and save data from the MLS. The quality of play may not have quite matched that seen in the Premiership of the day, but the amount of data available far outstripped that that was commonly available to the UK newsgroups that preceded blogging.

Attempting to tease the luck from the talent in the shot saving percentages seen in the likes of Kevin Hartman, Tony Meola, Joe Cannon and Tim Howard was a lot easier than trying to sensibly argue who England's current stopper should be. So a belated h/t to Big Soccer, where around half a dozen stat enthusiasts hung out in the dim distant past.

A recent tweet from the influential Steve Fenn, a must follow at @SoccerStatHunt, reminded me of the excellent work that is being done by the guys at notably, Harrison Crow (@Harrison_Crow). They are collecting and also sharing shot data in the current MLS. So a major h/t to them, the first attribute is fairly common, but the second is extremely rare and most welcome!

The availability of data is the major bottleneck is blog based analysis. Methodologies are fairly standard, but weight and credence to any conclusions only comes with increased sample size. It is fairly easy to develop a novel methodology, but the limited data can still make you look dumb.

Back in the day, shot attempts and outcome was the limit of the data, but the the volume of the data, stretching over seasons and, in the case of keepers, their longevity, still made analysis possible, if with a slightly wider error bar attached. Increased shot volume, it was hoped would even out issues of shot and chance quality, that did not exist to such as degree in either the controlled pitcher/batter contest in baseball or the more restricted playing area of hockey.

 I don't have an MLS photo. Instead here's Clint Dempsey celebrating Sounders' Interest (and a Goal against Stoke).
Nowadays, the still flawed gold standard from blogging shot analysis is data with x,y co ordinates, but often devoid of even the tiniest hint of defensive pressure, except in the most dedicated of collectors. Which is why the MLS data dump at American Soccer Analysis is so welcome. It improves greatly on shot data of the past by partly bridging the gap to professionally collected and protected data with the subdivision of shots into zones. Usually, slicing and dicing sample size leads to noise and over fitting, but ASA's venture may sacrifice sample size, but greatly increase uniformity of events within those smaller samples.

Applying one of my shooting analysis methods to ASA's improved data was therefore both sensible and a nostalgic treat. Broadly, this method assumes that shot outcome is common to each MLS team and centered around the league average. Any apparent deviation in shot accuracy percentage or conversion (and there is bound to be some) is going to be down to random variation and a talent gap in performing these tasks between sides. Quality of opportunity is hopefully controlled by ASA's use of shooting zones. So if we see a wider range of outcomes in the attempts each side made, compared to a random draw using league averages, we can possibly conclude that random variation isn't the only factor at work in deciding the shooting pecking order.

The sectors used along with the data are all available at ASA's site, so I urge everyone to seek it out there, but for partial clarity the sector descriptions are sector's 1,2,4 and 5 are central to the goal and more distant with increasing number and sector 3 is wide within the area and sector 6 is wide to the flanks.

I have taken shooting data from the site for every game played by every side in 2013 and compared the spread in accuracy (in terms of shots that require a save), conversion rates (goals scored) and the undesirable ability to see shots blocked that was recorded by each side against the type of spread expected from those shot numbers if team talent was universally the same in each sector and variation of outcome was purely luck driven.

Do Sector Outcomes Suggest Factors Other Than Random Variation are at Play in the MLS?

Sector taken from American Soccer Analysis Site. Does Accuracy Deviate from Random? Does Conversion Rate Deviate from Random? Does Avoiding Blocked Shots Deviate from Random?
1 Yes Yes Barely.
2 Strongly V Strongly Random.
3 Yes Yes Random
4 Yes Random Yes
5 Random Random Random
6 Yes Random Random

The results are tabulated above. Using shot data from 2013, there does appear to be some evidence that team conversion rates may show a talent differential when strikers are closest to goal. As attempts move further from goal (in the case of zones 4 and 5) and much wider out to the flanks (in case 6), that differential appears to disappear and outcomes become consistent with the average overall conversion rate for the  MLS. In short, skill may exist inside the box, but outside you're hoping to get the MLS at least.

A talent for greater (or lesser) shooting accuracy as measured by an attempt requiring a save appears to survive to greater distances and angles or it may show a tactical approach whereby a side is required to "make the keeper work" in expectation of a follow up rebound....Or everything may be the result of insufficient detail contained in the current, admirable data.

I know very little about the specifics of the current MLS, other than Dallas produce technical adept players and Seattle has the coolest kit, but others may make sense of Philly being the best opportunity corrected finishers in sector 1( closet to the goal) and Portland the most efficient in sector 2.

Random variation is ever present in the data, but recourse to this concept as a catch all when a side over or under performs against the league norm, may be less (or more) than fair to player and coaches alike, especially in the absence of any evidence that the talent gap at the very top level has disappeared completely.

To reiterate here's the link to American Soccer Analysis.

1 comment:

  1. Thank you for piecing this together! I had things like this in mind when we started gathering data, but I haven't had time to look.

    Some thoughts:
    "Or everything may be the result of insufficient detail contained in the current, admirable data."

    Absolutely. Despite the breakdown by zones, shot location origins could still be more specific. I have been hoping for a means of gather angle and distance on each shot, such that shot difficulty were put on a more quantitative scale. Additionally, some categorical measurement of defensive pressure/presence would be nice, too. Baby steps :-)

    I assume that within each zone, for each of the three stats in your table, you check for statistically significant differences between teams' rates. Were these rates compared across zones for shots AGAINST? My theory from the beginning was that teams have far more control over the outcomes of their own shots than the outcomes of those of their opponents (in most cases).

    Thank you again for working through the data. I think this is highly important information to how much we should look to regress finishing rates (and other rates) in each zone for prediction purposes. Your work is appreciated!