The Why Axis: Searching beneath the streetlight

A drunk is carefully searching the ground beneath a streetlamp at night. A passerby asks what they’re looking for, and the drunk says, their keys. The passerby asks if the drunk lost their keys there, and the drunk says, “No, but this is where the light is.”

The streetlight effect (or streetlight problem, or drunkard’s search) is a common problem when working with data. What we really want to know hasn’t already been measured, or is very difficult to measure, or can’t be measured at all, so we substitute an easily accessible dataset to try to achieve the same goals. Is our answer in that dataset? No, but that’s what’s on Github.

I’m wrapping up my read of Picturing the Uncertain World with an exploration of the streetlight effect. The examples I’m pulling from the book are based on semi-serious asides, not genuine policy proposals, but I think they illustrate the streetlight effect and some associated dangers.

The passerby in the story has one big question for the drunk: is he looking in the right place for his keys? However, I think there are three questions here for a data analyst to consider: are we looking in the right place for the keys, are the keys what we should be looking for, and what happens when we find them?

Continue reading “The Why Axis: Searching beneath the streetlight”


The Why Axis: small sample sizes and too many slopes

I’m working my way through Picturing the Uncertain World by Howard Wainer, a collection of articles about dealing with uncertainty in statistical thinking and visualization. I could (and might!) write an essay about every article in the book. In this first post, I want to pull out two points that might be useful when analyzing or writing about data.

Picturing the Uncertain World is a series of real-world case studies, often with deeply-felt consequences. I’m collapsing a few articles together here, so to illustrate these points I’m going to use a fictional example with absolutely no consequences: swords in the made-up country of Knightlandia. Look for the bolded text if you’re only interested in the bottom line.

Continue reading “The Why Axis: small sample sizes and too many slopes”