I’m too new at data visualization to create a best-of-the-2018 list, and my only work resolution is to keep going. Hi! I’m here! I like graphs! Instead, I’m starting the year by reflecting on why I do this work. Why on earth do graphs matter so much to me?
I’m working on a beeswarm plot about student loans, and I ran into a bit of trouble with my zero-debt group. If you aren’t familiar with beeswarm plots, take a look at this excellent example of gender ratios in newsrooms from Google Trends:
Each newsroom is a dot. Left-to-right position indicates the gender balance of the newsroom, and bigger newsrooms have bigger dots. Up-and-down position doesn’t officially indicate anything, but because the dots don’t overlap, the width of the “beeswarm” is an informal indicator of how many newsrooms have a particular gender balance.
In my case, I’m looking at student debt-to-earnings ratios for graduates of career-training programs. That is, how much of a person’s income goes towards paying their student loans every year? Unlike the example above, I have many small beeswarm plots, since I’m splitting the data by occupation.
Here’s how the plot came out of R:
Note the 91-program pileup at zero. So, what to do?
There are lots of ways to lie. Fortunately for visualization designers, there are also many ways to tell the truth. Different types of visualizations encourage different comparisons, reveal different patterns, and present different messages. However, highlighting one aspect of the data means obscuring another. There is no one true visualization: every chart is a trade-off.
For example: On August 14, the CDC released provisional counts of drug overdose deaths in 2017. The CDC provided several interactive visualizations, and the story was picked up by several news outlets, which created their own visualizations. Every chart and map tells a part of the story.
The CDC provided three interactive visualizations: a line chart examining total overdose-related deaths from January 2014 to January 2018, a similar line chart that split overdose deaths by type of drug, and a choropleth map showing the percent change in overdose deaths in each state.
(Two quick notes about the data: 1. Each data point includes the previous 12 months of deaths. That is, the figure for March 2015 includes opioid deaths from April 1 2014 through March 31 2015. 2. “Predicted deaths” are an estimate that corrects for underreporting of overdose deaths, not a prediction made in the past about future deaths.)
At The New York Times, The Upshot’s Margot Sanger-Katz recreated the CDC’s visuals with additional interpretive text and one small but important change: removing the line that showed aggregated opioid deaths and only showing the subclasses of opioids. Removing the aggregate data makes a lot of sense, given that the authors wanted to focus on the dramatic increase in synthetic opioid overdoses.
Christopher Ingraham at The Washington Post’s Wonkblog created a similar graphic, but included overdose deaths all the way back to 1999. The increase in synthetic opioid deaths looks even more dramatic when preceded by ten years of slower, steadier increases. Total opioid overdose deaths are also shown in this long view, which illustrates the high proportion of overdose deaths related to opioids.
After zooming way out, Ingraham closes the piece by zooming way in, pointing out a recent plateau in opioid deaths. This threw me for a loop–the previous chart in the WaPo piest showed no sign of a plateau–until I realized that this chart began in 2015, rather than 1999. From a more recent perspective, the plateau is a remarkable change, but it doesn’t even register on the 1999-2018 charts that show such an alarming spike in opioid overdose deaths. The longer time frame emphasizes the magnitude of the opioid crisis, but obscures recent developments. The shorter timeframe presents a less dramatic picture, but enables a more detailed look at the recent past.
Ingraham also redesigned the choropleth map, showing the overall death rate per state rather than the change in death rate. Putting the NYT chart next to the WaPo chart shows a different story about states like Vermont, which have particularly high rates of overdose deaths, but showed recent improvements. (Ingraham does note this in the text of the article.)
The Houston Chronicle took a more local view with a chart of all overdose deaths specifically in Texas. This chart tells a much simpler story than the others, but that focus drives home the gut-punch reality of 2,995 local deaths in one year.
The CDC, Washington Post, and New York Times reported on the same figures at the same nationwide level. However, even the relatively minor changes in the visualization of nationwide data changed the focus and narrative of the articles, while looking at one’s home state changed the immediacy of the story.