A drunk is carefully searching the ground beneath a streetlamp at night. A passerby asks what they’re looking for, and the drunk says, their keys. The passerby asks if the drunk lost their keys there, and the drunk says, “No, but this is where the light is.”
The streetlight effect (or streetlight problem, or drunkard’s search) is a common problem when working with data. What we really want to know hasn’t already been measured, or is very difficult to measure, or can’t be measured at all, so we substitute an easily accessible dataset to try to achieve the same goals. Is our answer in that dataset? No, but that’s what’s on Github.
I’m wrapping up my read of Picturing the Uncertain World with an exploration of the streetlight effect. The examples I’m pulling from the book are based on semi-serious asides, not genuine policy proposals, but I think they illustrate the streetlight effect and some associated dangers.
The passerby in the story has one big question for the drunk: is he looking in the right place for his keys? However, I think there are three questions here for a data analyst to consider: are we looking in the right place for the keys, are the keys what we should be looking for, and what happens when we find them?
1. Where are we looking?
In Chapter 2 of Picturing the Uncertain World, Wainer explores how made-up data are sometimes less strange and extreme than real numbers. He connects fabricating data based on eyeballing and intuition to curbstoning in the U.S. Census (when a census enumerator sits on the curb outside a house and guesses how many people live there, instead of going in to ask the residents).
His example is a table of states that lists IQ, income, and candidate selected in the 2000 election, originally posted on the Mensa website in 2002. As Wainer writes, “the obvious intent of this data display is to provide another characterization of the states that supported George Bush (poorer, dumber) in the 2000 election versus those who supported Al Gore (smarter, richer). Even a quick look at the table brings us up short…. Obviously these data could not be real.”
He then explores how close the fake data are to reality by comparing the table to more reputable figures. First, he compares the table’s household income by state with median household income by state from the 2002 U.S. Census. However, reliable and comprehensive state-by-state IQ data do not exist. He substitutes scores from the National Assessment of Educational Progress instead. The NAEP is a standardized test of math and reading performance among fourth and eight graders, given to a randomized representative sample of students across the country.
We are firmly beneath the streetlight now. Standardized test scores among fourth and eight graders are not a good proxy for the intelligence of voting adults. Fourth and eighth graders are not casting votes in the election. There’s no guarantee that everyone who voted in the state received their education there. Most importantly, standardized math and reading scores are not a good measure of overall intelligence, and are certainly not independent of the wealth of each state. Perhaps all that can be said for them in this context is that they are easy to find.
Wainer concludes that the disparities in intelligence and income were actually understated in the original table. I grant that household income was even higher in Gore-voting states and even lower in Bush-voting states than the table originally stated, but I don’t think we can reach any conclusions about voter intelligence based on standardized test scores. I’m also not convinced that we should be trying to reach conclusions about voter intelligence.
2. What are we looking for?
Is the drunk looking for his keys when he actually lost his wallet? We’re looking for information about intelligence on voters in each state. Why?
If the original table came out of a genuine desire to understand what happened and why each state voted the way they did, voter intelligence is not really a useful metric. What about access to social mobility, attitudes about abortion, concerns about the environment, opinions about race, religious affiliation, concerns about taxes, political spending in each state? None of these are under the streetlight, of course, but all of them are more useful than IQ scores. Overly simplified approaches to thorny questions will always result in overly simplified answers.
However, if the original table was created in bad faith to sneer at voters perceived as poor and stupid (as I suspect it was), then perhaps we ought to return to one of the core concepts of data visualization and analysis: the right tools for the job depend on the goal. If the goal is to soothe egos about an election and use intelligence as a cudgel, the right tool for the job might be a few drinks, an embarrassing conversation that stays at the bar, and a resolution to get out the vote next time. If the goal is to portray votes from people with low IQ as somehow less valid than votes from Mensa members, the right tool might be a good hard look at the limits of IQ as a measure and its history with eugenics and xenophobia in this country.
Wainer, of course, did not create the original table, and we seem to have the same view of its purpose. He explicitly connects the original table to the practice of guessing at IQ on sight in order to turn immigrants away at Ellis Island in the early 20th century. He condemns trying to guess at IQ, but I think it’s worth asking why we are considering IQ (or substitute measures) at all. Any analysis project should include a consideration of why we ask questions, not just how we try to answer them. Are we looking for keys when we should be looking for a wallet, and when is finding lost items worse than leaving them where they fell?
3. What happens when we find it?
Perhaps it is the drunk’s lucky day: the passerby has a flashlight, is willing to go for a walk in the dark with a stranger, and helps the drunk find their keys. What now? Does the drunk call a cab, now that they can get into their front door? Or does the drunk say thank you, get in the car parked nearby, and try to drive themselves home?
I’m spending a lot of words on a table that matters very little in the scheme of things. However, the original focus of the chapter (distortions in the U.S. Census because of curbstoning) may have serious consequences.
Wainer himself mentions this in the chapter: a former (white) Census enumerator considered curbstoning because he had difficulty connecting with Chinese-Americans. Overall estimates of curbstoning were low enough that Wainer wasn’t concerned about the accuracy of population counts. However, the census is used to determine federal funding and legislative representation. Undercounting (or overcounting) impacts the amount of money allocated for infrastructure and community services, the number of Congressional seats per state, and the definitions of legislative districts. If Census enumerators tend to guess at household sizes in neighborhoods where they feel uncomfortable due to race or class, federal funding and representation will be skewed along lines of race and class as well.
The issue of census accuracy has come up in the news recently with the addition of a citizenship question on the 2020 Census survey. If non-citizens don’t return the Census survey card because they worry about how their answers will be used, immigrant populations will be undercounted and underserved on an institutional level. An inaccurate Census isn’t as immediately destructive as a drunk behind the wheel of a car, but it sets the stage for years of disenfranchisement and insufficient funding.
In sum: I continue to be no fun at all (and overly concerned about hypothetical drunks). May we all drop our keys in well-lit places, have easily-accessible data that directly answers our questions, and have the wisdom to know when to find a flashlight and when to get new keys cut in the morning.
This wraps up my readthrough of Picturing the Uncertain World. Next week, I’m turning to Tamara Munzner’s Visualization Analysis and Design. Stay tuned!