A drunk is carefully searching the ground beneath a streetlamp at night. A passerby asks what they’re looking for, and the drunk says, their keys. The passerby asks if the drunk lost their keys there, and the drunk says, “No, but this is where the light is.”
The streetlight effect (or streetlight problem, or drunkard’s search) is a common problem when working with data. What we really want to know hasn’t already been measured, or is very difficult to measure, or can’t be measured at all, so we substitute an easily accessible dataset to try to achieve the same goals. Is our answer in that dataset? No, but that’s what’s on Github.
I’m wrapping up my read of Picturing the Uncertain World with an exploration of the streetlight effect. The examples I’m pulling from the book are based on semi-serious asides, not genuine policy proposals, but I think they illustrate the streetlight effect and some associated dangers.
The passerby in the story has one big question for the drunk: is he looking in the right place for his keys? However, I think there are three questions here for a data analyst to consider: are we looking in the right place for the keys, are the keys what we should be looking for, and what happens when we find them?