Profiling protest data (or, what I did on my summer vacation)

This summer, I joined UVA’s Data Science for the Public Good program as a graduate fellow. I learned a ton, and I can’t speak highly enough of my experience there. One of the first lessons: ten weeks of data science sounds glamorous, but it’s four weeks of data profiling for every four weeks of data wrangling for every two weeks of data analysis.

There’s nothing new I can say about data wrangling. However, I want to take a moment to sing the praises of data profiling. Data profiling is a systematic way to dig into your data and evaluate fitness-for-use, beyond measures of central tendency and the first/last 5 rows. The data profiling method I learned at UVA has three pillars: completeness, uniqueness, and validity. I added an additional step: auditing for accuracy.

When you profile your data, you’re going to find stuff that looks weird! By investigating the weird stuff, you’ll get a real feel for your dataset’s texture and quirks, and a sense of what you can expect when working in it later.

I’ll show what can be learned from completeness, uniqueness, validity, and accuracy, with examples from the Count Love’s protest data. R code snippets are included where appropriate, and my full profiling script is available on github.

Continue reading “Profiling protest data (or, what I did on my summer vacation)”

The making of a making of

I recently had the pleasure of working with Gilda Santana, the head of the University of Miami’s architecture library. She needed a poster about her research on the educational genealogy of the architecture department.

Here’s a ten-thousand-foot view of the final product:

archGenealogy proof

If that’s hard to read, it’s because the print version is huge. Really huge. Visualization designer included for scale:

big chart energy

The full version can be downloaded from my portfolio here.

As Alli Torban recently pointed out on Data Viz Today, narrative charts (which depict individuals as lines moving through different stages or points in time) are great for spotting large-scale patterns while keeping an eye on the individual. They also take a lot of careful, painstaking work.

Here are my biggest takeaways:

  1. Design for the story you’re trying to tell. I made the diagram twice because the first time, I got caught up in my own interpretation instead of in my client’s research.
  2. Talk to your clients a lot. Like, a lot. I met with Gilda several times during this project to make sure I was on track–and I wasn’t! Those midway checkpoints helped me to course correct and get her what she needed.
  3. Take whatever help you can get from your tools. There isn’t a chart-building tool alive that can automatically generate a narrative chart, but starting with an alluvial diagram in RAWGraphs made the final version possible. (Do the RAWGraphs folks like pie-as-food? I want to send them a pie.)
  4. Leave the little details for last. Keeping the color scheme and background highlighting for last saved me from having to recolor the graphic a dozen times.

For the full process (and pictures!), read on.

Continue reading “The making of a making of”