Paint drip charts (or, what IS that?)

At the end of my summer fellowship, I tweeted:

Tweet from @alyssafowers that reads, "It's POSTER TIME at my summer fellowship, which means that I hear "what IS that" every time someone walks behind my desk."

I can’t blame my coworkers for being confused when I was working on… this:

match_drip_chart-nskill-long-01.png

(A PDF version of the chart is here.)

I called it a paint drip chart. Or an, “everything sure is a mess, isn’t it” chart. Or a “distributions in three dimensions” chart.

I created the paint drip chart because I wanted to show proportions within individual cases. This chart is wild, absolute nonsense, and I’m very proud of it. However, it is also wild and absolute nonsense. So why on earth did I use it?

Making wacky graphs is fun. Using them to communicate is a risk. They take more attention to read, and there’s always a chance that the reader will give up entirely or misunderstand what the chart is trying to show.

I decided to use the paint drip chart for four reasons:

  1. The chart was a good match for the venue. I designed this graphic for a poster session and a presentation. In both scenarios, I was on hand to talk through the visualization and answer any questions. I usually go for clarity over attention-seeking, but…
  2. The chart is really good at pulling people in. At the poster session, the paint drip chart drew people across the room to ask me what it meant. During the presentation, audience members who had listened with polite interest started leaned forward, asking questions, and discussing policy implications.
  3. I was sharing the chart with data geeks. We presented at the National Center for Science and Engineering Statistics, and at an event called the Data Science for the Public Good symposium. I knew that my audience was used to interpreting data visualizations, so I gambled that they would stick with me through an explanation of the chart. They did.
  4. There wasn’t a simpler way to do it. I weighed the options and decided that showing matches within individual jobs was worth the demands of a complex chart.

Read on for more about the meaning of this paint drip chart, other versions that never saw the light of day, and instructions on how to make your own!

A little bit of background first. Over the summer, I studied supply and demand in the skilled technical workforce with a research team at the University of Virginia. We looked at the gaps between skills requested in job ads and skills listed on candidates’ resumes. I developed a match score to measure fit between each job ad and each candidate. Match scores reflect the proportion of skills in the ad that were also on the candidate’s resume. A candidate that had one of five requested skills would have a match score of 0.2, a candidate with three of four requested skills would have a match score of 0.75, and so on.

The result: a 1,609 x 3,150 matrix, which captured a great deal of detail about match and mismatch in a specific job market. (We looked at critical care nurses in Richmond, Virginia as a case study.)

From this matrix, we learned:

  1. Different critical care nursing positions had wildly different requirements, so the numbers of strong and weak matches varied wildly from one job to the next. However:
    1. almost every job had a strong match to at least one candidate, and
    2. almost every job had a zero match to at least one candidate.
  2. Job ads listed between one and sixteen skills. The number of skills required by a job ad was important for patterns of match/mismatch with candidates.

On the one hand: great! On the other hand, how was I supposed to show this?

I looked at density curves first, but had difficulty finding an appropriate summary statistic. If I plotted the maximum scores, it looked like every job had great matches (ignoring all the poor matches). If I plotted the mean or median, it looked like every job was a poor match for the candidate pool (ignoring the strong matches). Plotting all the match scores showed a lot of zero-to-poor matches, but it was impossible to tell how those were distributed across jobs.

Furthermore, plotting all match scores created peaks based on the number of skills commonly requested in ads:

match_histogram_notes-01.png

I didn’t just want to show a distribution of a summary statistic. I wanted to show distributions within distributions, for more than five million observations. Oof.

I started grouping the results by job. If you have trouble with the next couple charts, don’t worry: I left them on the cutting room floor for a reason.

First I made a heat map, with job ads running horizontally through different match strengths:

richmond_zero_heat_perc_median_max.pngThen I tried a connected dot plot:

dot_njitter_zero_vert_richmond.png

Finally, my incredible colleague Samantha Cohen suggested I try something like a stacked bar chart, with a vertical line for each bar. The project lead, Vicki Lancaster, suggested that I group the jobs by number of skills requested.

Here’s that final version again:

match_drip_chart-nskill-long-01.png

(A PDF version of the chart is here.)

Did this answer all our questions? No. Did it point us towards valuable questions? Absolutely. (For instance, what were the skills in one-skill job ads that so few candidates seemed to have?) Did it convey the wide variation in match scores between candidates and jobs? I think so.

So if you want to draw people in, you need to show proportions within cases, and you’ll be on hand to explain your graph, a paint drip chart might be for you! In my paint drip chart, the observations were match scores and the cases were jobs, but this chart could also be used for student test scores within classrooms, income within households, and so on.

How to make a paint drip chart

The following ggplot code will make a basic paint drip chart:

ggplot(data, aes(x = order, y = proportion, 
color = match_level, fill = match_level) + 
geom_bar(stat = “identity”)

As long as your data is melted like this:

category_name proportion match_level match_denom order
no_match 0.33 zero 2 1
strong_match 0.66 strong 2 1
no_match 0.75 zero 2 2
strong_match 0.25 strong 2 2

This is a sample of two jobs with two skills each. The first job has zero match to 33% of candidates and a strong match to 66% of candidates. The second job has no match to 75% of candidates and a strong match to 25% of candidates. “Order” reflects the paint drip chart that I made, which sorted by number of skills and then median match score. However, you could order by anything, as long as all the rows within the unit have the same order number.

Please reach out if you have any questions/run into any trouble! I’m alyssafowers on Twitter and first name dot last name at gmail dot com.

Sam and I are rooting for you!

IMG_7810

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s