Glass benches: genre and minimalism in data visualization

(Part one of two)

I went to the campus art museum this weekend and found an unexpected puzzle. Which of these objects am I allowed to touch?



I go to museums a lot. In my mental model of a museum, anything with a little rope around it or a dark outline on the floor is an Exhibit and Not To Be Touched. By those rules, the bench is forbidden, but the plinth is fair game.

To the security guard’s amusement/alarm, I guessed wrong: I avoided the bench, and tried to put my sticky fingers on the shiny surface of the plinth. I was misled by genre.

Continue reading “Glass benches: genre and minimalism in data visualization”

Thoughts on the semester project: why protests?

On January 21, 2017, I put on my second-warmest hat and took to the streets of D.C., along with 200,000 of my closest friends. There was an extraordinary sense of connection, not just to the people around me, but to my family around the country who sent me pictures from their marches, and to the 424 other Women’s Marches happening at the same time.

I felt that sense of connection over and over again while protesting. And this year, I found a data set from Count Love that catalogued protests in America from January 2017 through the present. I’m spending an entire semester analyzing and visualizing that data.

Why protests? As I mentioned, I have some personal experience with the matter. Before going to grad school, I did my fair share of shouting and sign-waving in D.C. And the more protests I attend, the more I realize how little I know. The Women’s March was very different from counter-protesting pro-lifers at a women’s health clinic, which was very different from standing outside the Capitol building late into the night, dreading the vote on repealing Obamacare.

All those protests happened in the same place, about a consistent set of liberal positions. If there is that much variation in my not-very-varied experience, I can’t imagine how different protests are across the full landscape and ideological spectrum of this country.

I want to find out.

People protest because they care. I want to know what drove people to the streets. I want to know about nation-wide movements, and moments in local politics that never spread beyond one town or city.

I have four goals in this piece:

  1. Look at trends across the entire country
  2. Examine a few examples of local protests
  3. Invite readers to explore protests in and around their homes
  4. Draw on that sense of connection to build an aesthetic for the piece

Aside from lurking technical problems, I suspect my biggest challenge will be keeping the data art from unduly influencing the data analysis, and keeping the analysis from draining all the expressiveness from the art. But the only way to find out is to move forward with the project so: off I go!

Profiling protest data (or, what I did on my summer vacation)

This summer, I joined UVA’s Data Science for the Public Good program as a graduate fellow. I learned a ton, and I can’t speak highly enough of my experience there. One of the first lessons: ten weeks of data science sounds glamorous, but it’s four weeks of data profiling for every four weeks of data wrangling for every two weeks of data analysis.

There’s nothing new I can say about data wrangling. However, I want to take a moment to sing the praises of data profiling. Data profiling is a systematic way to dig into your data and evaluate fitness-for-use, beyond measures of central tendency and the first/last 5 rows. The data profiling method I learned at UVA has three pillars: completeness, uniqueness, and validity. I added an additional step: auditing for accuracy.

When you profile your data, you’re going to find stuff that looks weird! By investigating the weird stuff, you’ll get a real feel for your dataset’s texture and quirks, and a sense of what you can expect when working in it later.

I’ll show what can be learned from completeness, uniqueness, validity, and accuracy, with examples from the Count Love’s protest data. R code snippets are included where appropriate, and my full profiling script is available on github.

Continue reading “Profiling protest data (or, what I did on my summer vacation)”

Coping with Minard’s non-Euclidean cartography

Charles-Joseph Minard (1781-1870) was a French civil engineer, visualization designer, and all-purpose nerd. He’s best remembered for the invention of flow maps, which show the quantities of materials, people, or traffic moving from one place to another. He also didn’t think much of North Africa, South America, or the existence of Ireland:


(Ireland added for emphasis.)

As Sandra Rendgen wrote in The Minard System,

“Minard quite deliberately and continually transgressed every idea of cartographic precision… his ‘non-Euclidean cartography’ is not the result of coincidence, incompetence, or mere negligence. On the contrary, we must consider it a clear decision on Minard’s part to treat cartography as an ‘auxiliary canvas’ on which his main story (i.e., the drama of the statistical numbers) unfolds.”

It’s easy to be appalled by Minard’s fast-and-loose approach to world geography. See this map of English coal exports in 1860:

Continue reading “Coping with Minard’s non-Euclidean cartography”

Communicating with chaos

In the wake of last week’s shootings in Dayton and El Paso, I saw the following exchange on Twitter:

The Financial Times bubble chart trades clarity for expressiveness; the Economist bar chart trades expressiveness for precision. That bubble chart is particularly interesting in light of recent conversations about the covering mass shootings:

Continue reading “Communicating with chaos”

The making of a making of

I recently had the pleasure of working with Gilda Santana, the head of the University of Miami’s architecture library. She needed a poster about her research on the educational genealogy of the architecture department.

Here’s a ten-thousand-foot view of the final product:

archGenealogy proof

If that’s hard to read, it’s because the print version is huge. Really huge. Visualization designer included for scale:

big chart energy

The full version can be downloaded from my portfolio here.

As Alli Torban recently pointed out on Data Viz Today, narrative charts (which depict individuals as lines moving through different stages or points in time) are great for spotting large-scale patterns while keeping an eye on the individual. They also take a lot of careful, painstaking work.

Here are my biggest takeaways:

  1. Design for the story you’re trying to tell. I made the diagram twice because the first time, I got caught up in my own interpretation instead of in my client’s research.
  2. Talk to your clients a lot. Like, a lot. I met with Gilda several times during this project to make sure I was on track–and I wasn’t! Those midway checkpoints helped me to course correct and get her what she needed.
  3. Take whatever help you can get from your tools. There isn’t a chart-building tool alive that can automatically generate a narrative chart, but starting with an alluvial diagram in RAWGraphs made the final version possible. (Do the RAWGraphs folks like pie-as-food? I want to send them a pie.)
  4. Leave the little details for last. Keeping the color scheme and background highlighting for last saved me from having to recolor the graphic a dozen times.

For the full process (and pictures!), read on.

Continue reading “The making of a making of”

Working for the marrow: a review of Info We Trust

Adapted from correspondence with the author.

Grad school has made me into a mercenary reader. My habit is to tear the meat off of a book, throw the bones back, and move on to the next assignment. This is not a satisfying way to engage with RJ Andrews’s book, Info We Trust. The book isn’t meat–it’s marrow. It’s rich, it’s rewarding, and it requires a lot more work from me to be nourishing.

Many data visualization books read like textbooks. Funny, personable textbooks, but textbooks all the same. Info We Trust is more of a meditation. Gentle explanations of chart types meander through a speculative history of the human perception of up and down. Best practices for table design emerge from a section about the history of bureaucracy. There is no neat delineation between background, body, and exercises: it’s all of a piece. I suspect that is the point.

Info We Trust tells a grand story about civilization. The first three chapters are a human history of information, connecting data work today to the world before electronic record-keeping. I’m not immune to poetry, and I have a long-standing (if often neglected) love affair with history. In my experience, history is spiky with context, competing interests, and strange accidents. The narrative presented by Info We Trust is so smooth and straightforward that I find it suspect. As a heroic epic, it works. As a history, I’m not quite willing to take it on faith.

However, that’s also part of what I enjoyed about the book. In the chapter on storytelling, Andrews wrote, “Great stories are rich with opportunities for the listener to make connections on their own. These self-made connections help the story leap off the page and into the reader’s imaginative reality. The more the story becomes alive in the reader’s head, the more meaningful the story becomes.” If it isn’t obvious that I struggled with this book: I struggled with this book! But that struggle brought it to life. I came face-to-face with what I thought I knew, where I was willing to listen, and my own biases.

The second half of the book is rich with opportunities for positive connections, particularly in the chapters on museum design, storytelling, engineering, and advertising. Andrews opens doors to unexpected worlds, allows me to make my own connections, and lets me find value in my own way. In the cathedral case study, I got to see him draw those connections, too. The “we” in the title is not just a figure of speech. I felt like I was sitting in conversation with Andrews throughout the book. The extensive marginalia presented alternate-universe versions of that conversation, where we split away from the main narrative to wander down a different rabbit hole.

Info We Trust is a generous and deeply human reflection on data. There is plenty of concrete advice about visualization, but it is woven into the narrative, not plucked out, polished, and ready for use. Nor should it be. The field has plenty of technical manuals. It doesn’t have anything quite like this.

My habit as a reader is to ask, what is this book trying to do? What is it going to teach me? Andrews flips those questions back around: what am I going to do with the book? How am I going to learn from it? Info We Trust is not a list of best practices, an in-depth history, or an immediate return on investment. However, it is a refresh on the craft, a feast for the eyes, and an opportunity to think deeply by drawing connections. I’m grateful for the chance to wrestle with this text, and I expect I’ll return to the mat soon.