Sunday 8 July 2012

.............and Statistics.What Park Ji-Sung's 15 Yard Shot Conversion Rate Really Tells Us.

The world of football was stunned yesterday, when word leaked out of QPR's audacious swoop for Manchester United's Park Ji-Sung. OK I'm exaggerating somewhat, but it was certainly a news worthy event and the transfer briefly trended on Twitter. One tweet caught my eye when ESPN declared that "the biggest drop off was his finishing" and backed the view up with his stats in graphical form from 2010/11 and 2011/12. Five goals from seven non blocked shots in the former season compared to one goal from five in the latter. So compelling evidence that QPR had bought a player whose finishing had well and truely "dropped off" over the previous two years?

Park Ji-Sung.

The counter argument quite naturally should start with the fact that headline conclusions are being made on the evidence of very small sample sizes. Seven attempts in 2010/11 and five in the following season. Sample's of that size are going to see conversion rates bounce around considerably just through random chance. In writing the previous posts that looked at randomness in footballing rate statistics I found that the proportions of chance and skill converged after around 30 shots for individuals. So a player could produce the kind of numbers in single figure attempts that Park Ji-Sung posted through a lot of luck and no drop of in true talent or shot converting ability between the two seasons. His high strike rate in 2010/11 on such a small sample size is almost certain to regress towards the mean in future years.

However, there's also one statistical finesse that appears to have been used here. Park's conversion rate has been restricted to shots from 15 yards and has excluded efforts that were blocked. The latter proviso appears to be down to an individual data supplier's particular policy, so we'll ignore that for the moment. But the choice of a 15 yard cut off point for shots is curious.

Why not 18 yards or 12, there's an 18 yard line and a 12 yard penalty spot after all. Or why not use all of Park's goal attempts. 17 in 2010/11 and 12 in 2011/12.

Park's Shot Record 2010/11 Sorted By Distance.(Red Shots <15 yards,Old Gold >15 yards).

Opponent. Shot Outcome. Probability
of Shot Being On Target.
of Shot Being Blocked.
of a Goal.
Wolves On Target 0.48 0.12 0.15
Blackpool Goal 0.51 0.16 0.29
Blackburn Goal 0.50 0.18 0.30
Arsenal Goal 0.47 0.20 0.24
Blackburn Blocked 0.46 0.19 0.22
Wolves Blocked 0.43 0.21 0.16
Wolves Goal 0.43 0.22 0.18
Wolves Goal 0.41 0.21 0.13
Tottenham Off Target 0.35 0.18 0.06
Fulham Blocked 0.38 0.21 0.10
Fulham On Target 0.41 0.25 0.17
Arsenal Blocked 0.32 0.22 0.05
Wolves Off Target 0.27 0.20 0.02
Chelsea On Target 0.22 0.21 0.01
Man City Blocked 0.26 0.26 0.03
Tottenham Off Target 0.26 0.31 0.04
Wolves Blocked 0.21 0.28 0.01

Cumulative Probability. 6.6
Compared to 6 Actual Shots on Target
Compared to 6 Actual Blocks.
Compared to 5 Actual Goals.

Above are all of Park's shots from 2010/11, including all of his blocked efforts, which should be ignored because they don't seem to have formed part of the ESPN tweet. They have been sorted by distance, with the closest shots listed first. The shots against teams marked in red are from within 15 yards and they appear to be the shots that are being described by ESPN. So in the 2010/11 data the selected end point of 15 yards comes almost immediately after his furthest goal.

The shots marked in old gold originate further than 15 yards from goal and the first "orange" shot against Fulham drew a blank and didn't make it into the set that  "demonstrates" Park's potency because it was taken a couple of inches outside of the chosen cut off point. By chopping the data end point at 15 yards some unproductive efforts are omitted that would have reduced his conversion rate and made his 2010/11 season appear less impressive in it's raw unregressed form.

This kind of slice and dice approach to data not only reduces sample size, it also leads to some very misleading conclusions. For example if I move Park's end point even closer to goal at the six yard line, his conversion rate becomes 0 from 1 or zero percent!

Park's Shot Record 2011/12 Sorted By Distance. (Red Shots <15 yards,Old Gold >15 yards).

Opponent. Shot Outcome. Probability
of Shot Being On Target.
of Shot Being Blocked.
of a Goal.
Everton On Target 0.49 0.15 0.22
Wigan On Target 0.46 0.20 0.22
Wigan Goal 0.46 0.20 0.21
Man City Off Target 0.37 0.18 0.07
Arsenal Off Target 0.40 0.22 0.12
Norwich Blocked 0.32 0.19 0.03
Wigan Off Target 0.40 0.25 0.16
Arsenal Goal 0.34 0.22 0.06
Liverpool Off Target 0.31 0.24 0.05
Tottenham On Target 0.35 0.28 0.10
Blackburn Blocked 0.25 0.30 0.03
Norwich Blocked 0.28 0.32 0.05
Cumulative Probability 4.4 Compared to 5 Actual
Shots on Target
2.8 Compared to 3 Actual
1.3 Compared to 2 Actual

The ability to demonstrate whatever you chose by pre selecting a convenient end point is further highlighted when we look at Park's "drop off" season in 2011/12. If the same 15 yard line is chosen, happily for the preferred narrative the line falls a couple of feet in front of his goal against Arsenal,eliminating it from the 2011/12 sample set. Therefore, ESPN can report his 2011/12 conversion rate from 15 yards as 1 from 5 (20%), instead of the 2 from 6 (33%) if they'd gone the extra two and a bit feet requiring inclusion of his small part in United's 8-2 rout of the Gunners.

I've no problem with data like this being used as part of a "this is what happened" narrative. But small sample sizes such as were used here can tell us very little about a player's decline or improvement from one season to the next and once convenient end points start being drawn to exaggerate an already unconfirmed difference between one season's performance and the next...........well.

1 comment:

  1. Enjoyed the post,
    one of my biggest gripes is small sample size conclusions and as you say "finessing" the data to emphasis a difference that,if it does exist is much smaller than the article implies.It doesn't help the stats revolution.

    The curse of 370 in the NFL was based on a similar "use" of cutoff points.