Profiling protest data (or, what I did on my summer vacation)

This summer, I joined UVA’s Data Science for the Public Good program as a graduate fellow. I learned a ton, and I can’t speak highly enough of my experience there. One of the first lessons: ten weeks of data science sounds glamorous, but it’s four weeks of data profiling for every four weeks of data wrangling for every two weeks of data analysis.

There’s nothing new I can say about data wrangling. However, I want to take a moment to sing the praises of data profiling. Data profiling is a systematic way to dig into your data and evaluate fitness-for-use, beyond measures of central tendency and the first/last 5 rows. The data profiling method I learned at UVA has three pillars: completeness, uniqueness, and validity. I added an additional step: auditing for accuracy.

When you profile your data, you’re going to find stuff that looks weird! By investigating the weird stuff, you’ll get a real feel for your dataset’s texture and quirks, and a sense of what you can expect when working in it later.

I’ll show what can be learned from completeness, uniqueness, validity, and accuracy, with examples from the Count Love’s protest data. R code snippets are included where appropriate, and my full profiling script is available on github.

Continue reading “Profiling protest data (or, what I did on my summer vacation)”

Coping with Minard’s non-Euclidean cartography

Charles-Joseph Minard (1781-1870) was a French civil engineer, visualization designer, and all-purpose nerd. He’s best remembered for the invention of flow maps, which show the quantities of materials, people, or traffic moving from one place to another. He also didn’t think much of North Africa, South America, or the existence of Ireland:


(Ireland added for emphasis.)

As Sandra Rendgen wrote in The Minard System,

“Minard quite deliberately and continually transgressed every idea of cartographic precision… his ‘non-Euclidean cartography’ is not the result of coincidence, incompetence, or mere negligence. On the contrary, we must consider it a clear decision on Minard’s part to treat cartography as an ‘auxiliary canvas’ on which his main story (i.e., the drama of the statistical numbers) unfolds.”

It’s easy to be appalled by Minard’s fast-and-loose approach to world geography. See this map of English coal exports in 1860:

Continue reading “Coping with Minard’s non-Euclidean cartography”

Communicating with chaos

In the wake of last week’s shootings in Dayton and El Paso, I saw the following exchange on Twitter:

The Financial Times bubble chart trades clarity for expressiveness; the Economist bar chart trades expressiveness for precision. That bubble chart is particularly interesting in light of recent conversations about the covering mass shootings:

Continue reading “Communicating with chaos”

The making of a making of

I recently had the pleasure of working with Gilda Santana, the head of the University of Miami’s architecture library. She needed a poster about her research on the educational genealogy of the architecture department.

Here’s a ten-thousand-foot view of the final product:

archGenealogy proof

If that’s hard to read, it’s because the print version is huge. Really huge. Visualization designer included for scale:

big chart energy

The full version can be downloaded from my portfolio here.

As Alli Torban recently pointed out on Data Viz Today, narrative charts (which depict individuals as lines moving through different stages or points in time) are great for spotting large-scale patterns while keeping an eye on the individual. They also take a lot of careful, painstaking work.

Here are my biggest takeaways:

  1. Design for the story you’re trying to tell. I made the diagram twice because the first time, I got caught up in my own interpretation instead of in my client’s research.
  2. Talk to your clients a lot. Like, a lot. I met with Gilda several times during this project to make sure I was on track–and I wasn’t! Those midway checkpoints helped me to course correct and get her what she needed.
  3. Take whatever help you can get from your tools. There isn’t a chart-building tool alive that can automatically generate a narrative chart, but starting with an alluvial diagram in RAWGraphs made the final version possible. (Do the RAWGraphs folks like pie-as-food? I want to send them a pie.)
  4. Leave the little details for last. Keeping the color scheme and background highlighting for last saved me from having to recolor the graphic a dozen times.

For the full process (and pictures!), read on.

Continue reading “The making of a making of”

Working for the marrow: a review of Info We Trust

Adapted from correspondence with the author.

Grad school has made me into a mercenary reader. My habit is to tear the meat off of a book, throw the bones back, and move on to the next assignment. This is not a satisfying way to engage with RJ Andrews’s book, Info We Trust. The book isn’t meat–it’s marrow. It’s rich, it’s rewarding, and it requires a lot more work from me to be nourishing.

Many data visualization books read like textbooks. Funny, personable textbooks, but textbooks all the same. Info We Trust is more of a meditation. Gentle explanations of chart types meander through a speculative history of the human perception of up and down. Best practices for table design emerge from a section about the history of bureaucracy. There is no neat delineation between background, body, and exercises: it’s all of a piece. I suspect that is the point.

Info We Trust tells a grand story about civilization. The first three chapters are a human history of information, connecting data work today to the world before electronic record-keeping. I’m not immune to poetry, and I have a long-standing (if often neglected) love affair with history. In my experience, history is spiky with context, competing interests, and strange accidents. The narrative presented by Info We Trust is so smooth and straightforward that I find it suspect. As a heroic epic, it works. As a history, I’m not quite willing to take it on faith.

However, that’s also part of what I enjoyed about the book. In the chapter on storytelling, Andrews wrote, “Great stories are rich with opportunities for the listener to make connections on their own. These self-made connections help the story leap off the page and into the reader’s imaginative reality. The more the story becomes alive in the reader’s head, the more meaningful the story becomes.” If it isn’t obvious that I struggled with this book: I struggled with this book! But that struggle brought it to life. I came face-to-face with what I thought I knew, where I was willing to listen, and my own biases.

The second half of the book is rich with opportunities for positive connections, particularly in the chapters on museum design, storytelling, engineering, and advertising. Andrews opens doors to unexpected worlds, allows me to make my own connections, and lets me find value in my own way. In the cathedral case study, I got to see him draw those connections, too. The “we” in the title is not just a figure of speech. I felt like I was sitting in conversation with Andrews throughout the book. The extensive marginalia presented alternate-universe versions of that conversation, where we split away from the main narrative to wander down a different rabbit hole.

Info We Trust is a generous and deeply human reflection on data. There is plenty of concrete advice about visualization, but it is woven into the narrative, not plucked out, polished, and ready for use. Nor should it be. The field has plenty of technical manuals. It doesn’t have anything quite like this.

My habit as a reader is to ask, what is this book trying to do? What is it going to teach me? Andrews flips those questions back around: what am I going to do with the book? How am I going to learn from it? Info We Trust is not a list of best practices, an in-depth history, or an immediate return on investment. However, it is a refresh on the craft, a feast for the eyes, and an opportunity to think deeply by drawing connections. I’m grateful for the chance to wrestle with this text, and I expect I’ll return to the mat soon.

Left (or right) this way: pinpointing change with hedgehog maps

A rare and endangered group of maps emerges the morning after an election, only to disappear as soon as the news cycle moves on. You know, these guys:


Left to right, top to bottom: New York Times 2018, The Guardian 2018, New York Times 2016, Bloomberg 2018.

Wait! Come back! I miss you!

These maps punch way above their weight. I’d like to take a closer look at what they do and why they work. In the absence of another name, I am calling them hedgehog maps.

Hedgehog maps use arrows to indicate change, with the base of the arrow showing the location of the change, the length  indicating the magnitude of the change, and the angle indicating the direction. They’re distinct from flow maps, which use arrows to link two locations, and arrow plots, which uses the base of the arrow to indicate a starting value and the arrowhead to indicate a finishing value.

The result is striking, as intuitive as it is unquantifiable. Look at The Guardian’s map of changes in vote share during the 2018 midterm elections:

Continue reading “Left (or right) this way: pinpointing change with hedgehog maps”

Spinning the Concrete Dial with Hypothetical Outcome Plots

Blanket disclaimer: This is a post about animated visualizations, illustrated mostly by static screenshots. Please consider clicking through to see the actual visualizations! My screengrabs don’t do them justice.

I’ve been chewing on uncertainty visualizations since Matthew Kay’s excellent talk at Tapestry 2018. The recent release of the R package gganimate has also brought a number of animated visualizations across my feed, so let’s talk about an animated uncertainty visualization: hypothetical outcome plots (HOPs). What are they for, besides inspiring truly terrible puns?

One of the core functions of statistics is making inferences about a population based on limited information. Sometimes that means estimating a value (average sword price, for instance); sometimes that means modeling to describe the relationships between variables and to predict what might happen in the future. Those estimates and predictions look very precise when depicted as single points or figures. However, there is always uncertainty involved: other estimates that we could have gotten if we repeated the study, or a range of possible outcomes from the model. HOPs depict uncertainty by animating a sequence of outcomes that could occur, rather than showing a single number. (If you’d like to know more, I really can’t do better than Jessica Hullman’s original Medium post about HOPs.)

HOPs get a lot of press for making viewers encounter uncertainty, but that’s far from their only application. I think about HOPs as a kind of concreteness dial. They make estimations less concrete by forcing an audience to experience uncertainty, and they make processes more concrete by showing the different ways a model might play out.

Continue reading “Spinning the Concrete Dial with Hypothetical Outcome Plots”

Mercy on our minds: lightening cognitive load with the known-new contract

Lawrence Evalyn and I have an interdisciplinary friendship. I study research methods and data visualization; he studies eighteenth century literature and the digital humanities. I taught him about pivot tables; he taught me about sentence stress. I still think I got the better half of that exchange.

Sentence stress is my favorite tool for writing about complicated topics. Communicating complexity is also my goal in data visualization, so sentence stress is a natural complement to a conversation about data storytelling.

According to the concept of sentence stress, every sentence has two parts: the topic position and the stress position. A sentence’s stress position establishes a sentence’s main idea. It always comes just before a full stop. For example, “When the pirates come over, we played board games” emphasizes the board games. “At our board game night, we played with pirates” emphasizes the pirates.

Continue reading “Mercy on our minds: lightening cognitive load with the known-new contract”

Framing questions and crochet hooks

Interlibrary loan has reclaimed my copy of Visualization Analysis and Design, so I’m on to the next book on my shelf: Information Visualization: Perception for Design by Colin Ware.

I stand behind Ware’s position that data visualization is a tool for cognitive work, an external aid that shores up memory and pattern perception. Our brains need tools to think through complicated information, the same way our hands need tools to weave cloth. I can see the numbers in a spreadsheet, but interpreting them is like trying turn a pile of yarn into fabric with nothing but my fingers. A simple tool like a crochet hook radically extends what I can do with raw materials.

I do, however, struggle with the profit model introduced in the first chapter. Ware writes that learning to interpret new graphic symbols comes with a cost, and that novel designs should be used only when their benefits outweigh the cost of learning to use them.

Continue reading “Framing questions and crochet hooks”