Too many bees, not enough swarm

I’m working on a beeswarm plot about student loans, and I ran into a bit of trouble with my zero-debt group. If you aren’t familiar with beeswarm plots, take a look at this excellent example of gender ratios in newsrooms from Google Trends:

Screen Shot 2018-11-03 at 7.06.19 PM.png

Each newsroom is a dot. Left-to-right position indicates the gender balance of the newsroom, and bigger newsrooms have bigger dots. Up-and-down position doesn’t officially indicate anything, but because the dots don’t overlap, the width of the “beeswarm” is an informal indicator of how many newsrooms have a particular gender balance.

In my case, I’m looking at student debt-to-earnings ratios for graduates of career-training programs. That is, how much of a person’s income goes towards paying their student loans every year? Unlike the example above, I have many small beeswarm plots, since I’m splitting the data by occupation.

Here’s how the plot came out of R:


Note the 91-program pileup at zero. So, what to do?

Continue reading “Too many bees, not enough swarm”


Sword Graphs Part I: Self-Encoding

For your consideration, swords:


The sword graph was a bit of self-indulgent fun, but it did give me an opportunity to reflect on graph humor and the appeal of self-encoding. I created the term “self-encoding” to describe charts where the object being described represents (or encodes) itself, rather than being translated into a more abstract image like a bar or a dot. Self-encoding preserves important quantitative information (such as the length of a hilt) while also presenting additional qualitative information (the presence or absence of a pommel, the shape of the crossguard).

Sometimes self-encoding is just for fun. Consider these two classics of graph humor:

Continue reading “Sword Graphs Part I: Self-Encoding”

Interaction Without Interactivity

Last week I sat in on a guest lecture by Xaquín G.V., a visual editor at the New York Times. He showed a variety of interactive projects rich in hooks. One article from his time at the Guardian asked readers to create a stable coalition government by dragging and dropping political parties. Another interactive was a surprise at the end of an article about the gender pay gap, showing how much more money a man would have made than a woman in the time since the page was opened.

Screen Shot 2018-10-15 at 11.12.27 PM

Hook is an accurate term: as a reader, I immediately wanted to play with these visualizations. As a designer, I immediately wanted to make interactives like them. Unfortunately, I haven’t learned how to build interactive visualizations yet. So I started to wonder: how can I achieve a similar effect in static visualizations?

Continue reading “Interaction Without Interactivity”

Time and Space

I’m up to my ears in student loan data at the moment—not my own this time, thank God—and trying my hand at the peculiar alchemy of data visualization. A group of loans becomes a list of numbers, becomes an aggregation, becomes an angle or a color or a position in space. Encoding turns a thousand bills at a thousand kitchen tables into a digestible summary.

This week, I’ve also been thinking about how we encode attention: trading in space and time to communicate when an audience should stop and think. Take this graphic, part of a New York Times feature on the survivors of the Las Vegas massacre:


Here, space breaks one number down to its components: not to transform them in some way, or to compare between them, but to convey that there are individuals within an aggregation. The print version of the story traded inches of column space for individual figures of each victim:


The graphic doesn’t communicate any information beyond the labels: 456 injured, 413 shot, 58 killed. Instead it creates a space for reflection on the individuals within those numbers. The graphic isn’t space-efficient, because efficiency isn’t the point.

Continue reading “Time and Space”

Blown out of the blanket fort: beginning Tamara Munzner’s Visualization Analysis and Design

Confirmation bias is pernicious, but so is confirmation pleasure: the comfortable settling-in while reading one’s third or fourth introductory text on a subject, instead of reaching for something a little more challenging. It’s like reading a retelling of a fairy tale, or watching the hundredth episode of a procedural. When I open an intro-level book on data visualization, I know we’re going to talk about chartjunk, and axes that stretch to zero, and the concept of statistical uncertainty. I can snuggle into these subjects like a blanket.

The first chapters of Tamara Munzner’s Visualization Analysis and Design blew that smugness right out of me. Reading this book isn’t like wrapping myself in a blanket. It’s like climbing a rope ladder in a windstorm. I know how I got here, I can see where I’m going, and if I stretch, I can just reach the next rung. But I’ve been blown far from my comfort zone, and suddenly I can see that the horizon stretches way further than I expected.

All I can say is: good, and, more of this please.

Munzner’s book isn’t a how-to guide. It’s a framework for thinking about information visualization: all information visualization. I’m accustomed to thinking of vis as a means of communication. Munzner identifies communication as one of many possible goals, and develops a vocabulary that reaches across disciplines. When is a field not a field? When we have to consider continuous spatial data, that’s when.

While I’m busy rebuilding my fundamental understanding of information visualization, here are a few points that stuck out to me:

  • Visualization is an extension of human memory and information processing, but it’s also an extension of computer capacity. Graphics translate back and forth between a computer’s raw processing power and a human’s ability to pick out patterns that matter to other humans. And, like any translator, the grammar that a visualization uses when presenting information changes the way that information is understood. (It is a little bit refreshing to think of humans as an asset instead of a ball-and-chain for algorithms!)
  • The goal of visualization is not to optimize, it is to satisfy. When looking at design problems, don’t look at a narrow range of options and obsess over finding the very best one. Keep your eyes open to a wide variety of options and choose one of the many good ones. This goes counter to all of my instincts: I’m accustomed to thinking that if I spend just a few more minutes tinkering with formatting and color and font, I can beam my intentions directly into the brain of my audience. That’s a great way to fall down a rabbit hole of minutiae, and a poor way to keep my eyes open to the full scope of methods available to me.

The Why Axis: Searching beneath the streetlight

A drunk is carefully searching the ground beneath a streetlamp at night. A passerby asks what they’re looking for, and the drunk says, their keys. The passerby asks if the drunk lost their keys there, and the drunk says, “No, but this is where the light is.”

The streetlight effect (or streetlight problem, or drunkard’s search) is a common problem when working with data. What we really want to know hasn’t already been measured, or is very difficult to measure, or can’t be measured at all, so we substitute an easily accessible dataset to try to achieve the same goals. Is our answer in that dataset? No, but that’s what’s on Github.

I’m wrapping up my read of Picturing the Uncertain World with an exploration of the streetlight effect. The examples I’m pulling from the book are based on semi-serious asides, not genuine policy proposals, but I think they illustrate the streetlight effect and some associated dangers.

The passerby in the story has one big question for the drunk: is he looking in the right place for his keys? However, I think there are three questions here for a data analyst to consider: are we looking in the right place for the keys, are the keys what we should be looking for, and what happens when we find them?

Continue reading “The Why Axis: Searching beneath the streetlight”

An Accounting

Picturing the Uncertain World closes with visualizations created by the Jewish residents of the Kovno Ghetto. The visualizations were created as part of a community effort to record the Holocaust as it happened. There are bar and line and isotype charts of population changes, tables that summarize overcrowding, diagrams that show common illnesses and injuries in the ghetto. One line chart caught and held me: a display showing the population of the ghetto in September 1941 and again in November of the same year.

kovno chart

Scanned from Picturing the Uncertain World. More images here from the United States Holocaust Memorial Museum, and a full electronic exhibit on the Kovno Ghetto here.

The red upper line represents the population in September 1941, split into age groups across the bottom axis. The black lower line represents the population in November of the same year. The shaded region in between represents the people who died that autumn.

In the face of deprivation and death, the residents of the Kovno Ghetto did what all humans do: record. When moments are too big for our capacity to feel and understand, we spill them out into diaries, letters, conversations, and art. The Statistics Office in the Kovno Ghetto recorded their community in its entirety: how many people once lived there, and how many still survived.

Continue reading “An Accounting”