Basketball: Myths and Puzzles

08 November 2017

Coaches, managers, players, and fans all have good reasons to care about statistics. Gathering the right kind of information about when and how often something happens—a team winning, a player achieving a record, or a group players adopting a new technique—promises insight about why it happens, and what its downstream effects might be. For a fan, this kind of understanding is valuable in itself; for someone with skin in the game, it can provide the control necessary for building a winning team. But moving from correlation to causation is famously fraught.

In a type of statistical puzzle known as Simpson’s Paradox, the correlations themselves look contradictory. The popular mathematician Martin Gardner describes itthis way: “the data will confirm each of two hypotheses, but disconfirm the two taken together.” Basketball examples provide a way to make this abstract idea more concrete.

以投篮为例,投篮可以分为两分投篮和三分投篮。要确定一个球员的得分天赋,我们可能会考虑他们射门成功率的百分比。According to theWNBA statisticsfor the regular 2017 season, Sue Bird outperformed Sancho Lyttle in three-point field goals, succeeding at 39.3% of her attempts compared to Lyttle’s 25.0%. We can calculate that Bird also outperformed Lyttle in two-point field goals, 46.8% to 44.4%. But Lyttle outperformed Bird in field goals overall, 43.5% to 42.7%.

In other words, Lyttle appears to be worse than Bird at scoring two-point field goals, and worse than Bird at scoring three-point field goals, but somehow better at scoring field goals overall. What should we make of this confusing data?

The solution is that three-point field goals are harder to make than two-point field goals, and Bird attempted a greater percentage of her shots from behind the three-point line. If you had the two players shoot from the same place on the court, you would expect Bird to succeed more often. But since they don’t shoot from the same place on the court, and Lyttle takes easier shots a greater percentage of the time, she ends up succeeding at more of her attempts.

Another example of Simpson’s Paradox involves the “hot hand” phenomenon. Fans have long held that players experience runs of success or failure. If a player succeeds at sinking a free throw, the theory goes, they’re having a successful streak, which makes them more likely to sink the next free throw. In 1985, the psychologists Gillovich, Vallone, and Tversky publisheda statistical argumentthat the hot hand is a myth. They considered free throw data for 9 major players for the Philadelphia 76ers in the 1980-81 season, and showed that eight of them were slightly less likely to succeed after a run of successes than a run of failures.

StatisticianRobert Wardropsuggested that Simpson’s Paradox might explain why the “hot hand” phenomenon looks real, even if it’s not. Players vary in their shooting ability. If a player sinks a free throw, that player is more likely to have the skill to sink their next free throw; if a player misses, they are more likely to be a less skilled player who misses their next shot. But holding the player fixed, a successful free throw is no more likely given a previous success than it is given a previous failure.

(Ina more recent developmentin the “hot hand” story, economists Miller and Sanjurjo argue that there is a flaw in Gillovich et al’s 1985 study: even if there is no relationship between success on one shot and success on the next, it’s not a good idea to consider free throws that happen after a run of successes. This will bias you toward looking at free throws that miss. But that problem seems to be independent of Simpson’s Paradox.)

道德上的相关性不是因果关系吗?我觉得事情远不止如此。观察变量之间的相关性可以说明因果关系,但这个因果关系可能并不简单。我们在WNBA的例子表明,一名球员在投篮上的成功至少取决于两个因素:她在球场上的哪里投篮,以及她的瞄准。Computer scientists have developeda more general theoryabout how to uncover causal information based on correlations, which handles cases like our basketball examples. Maybe all that data can tell us something useful, not just about who wins basketball games, but about why.

(感谢Reuben Stern的体育讨论,以及Jeremy Lizakowski教我如何使用Ruby搜索WNBA数据表。)