I’m working on a beeswarm plot about student loans, and I ran into a bit of trouble with my zero-debt group. If you aren’t familiar with beeswarm plots, take a look at this excellent example of gender ratios in newsrooms from Google Trends:
Each newsroom is a dot. Left-to-right position indicates the gender balance of the newsroom, and bigger newsrooms have bigger dots. Up-and-down position doesn’t officially indicate anything, but because the dots don’t overlap, the width of the “beeswarm” is an informal indicator of how many newsrooms have a particular gender balance.
In my case, I’m looking at student debt-to-earnings ratios for graduates of career-training programs. That is, how much of a person’s income goes towards paying their student loans every year? Unlike the example above, I have many small beeswarm plots, since I’m splitting the data by occupation.
Here’s how the plot came out of R:
Note the 91-program pileup at zero. So, what to do?
Here are my dilemmas:
- Left-to-right positioning has a precise meaning, so I don’t have any wiggle room there.
- Leaving the zero-debt dots in one long line is not an option: there are so many zero-debt dots for criminal justice and licensed nursing (#16 and #15) that they spill over and make it look like business and cinematography (#14 and #13) have many zero-debt programs. (They do not.)
- Making the dots transparent and shoving them on top of each other won’t work: there are simply too many dots for the number of zero-debt programs to be legible, and program type (indicated by color) would be lost in the muddle.
However, it occurred to me that zero isn’t just a number in this case. It’s also a category. Programs can be split into two groups: programs where students graduate with debt and programs where students graduate without debt. So rather than trying to cram everything in at the zero point, I bucketed them into a gray box. Here’s a section of the cleaned-up chart:
Graduates of the programs in the gray box do not have debt. Graduates of programs outside the box have debt, and distance from the box indicates how much of their income is spent on that debt.
This wouldn’t have worked if there were 91 programs with a debt-to-income ratio of, say, 5.8: the box would break the axis and distort the scale. But in this case, I’m satisfied with bucketing. It’s clear which occupations have zero-debt programs and what types of programs tend to be zero-debt… or at least it will be, once I get the annotations on!