Framing questions and crochet hooks

Interlibrary loan has reclaimed my copy of Visualization Analysis and Design, so I’m on to the next book on my shelf: Information Visualization: Perception for Design by Colin Ware.

I stand behind Ware’s position that data visualization is a tool for cognitive work, an external aid that shores up memory and pattern perception. Our brains need tools to think through complicated information, the same way our hands need tools to weave cloth. I can see the numbers in a spreadsheet, but interpreting them is like trying turn a pile of yarn into fabric with nothing but my fingers. A simple tool like a crochet hook radically extends what I can do with raw materials.

I do, however, struggle with the profit model introduced in the first chapter. Ware writes that learning to interpret new graphic symbols comes with a cost, and that novel designs should be used only when their benefits outweigh the cost of learning to use them.

Continue reading “Framing questions and crochet hooks”


Sword Graphs Part II: Abstraction in Self-Encoding

In Sword Graphs Part I, I introduced the concept of self-encoding with this chart:


The graphic is self-encoded because the images themselves represent a value, rather than that value being translated into a mark like a bar or dot. Information about the length of the blade is represented by the length of the blade: the sword encodes itself.

But why not go a step further and show actual photographs of the swords, or a step fewer and use the same generic outline for all of them? The choice of images in self-encoding depends on specificity and processing speed.

Continue reading “Sword Graphs Part II: Abstraction in Self-Encoding”

Sword Graphs Part I: Self-Encoding

For your consideration, swords:


The sword graph was a bit of self-indulgent fun, but it did give me an opportunity to reflect on graph humor and the appeal of self-encoding. I created the term “self-encoding” to describe charts where the object being described represents (or encodes) itself, rather than being translated into a more abstract image like a bar or a dot. Self-encoding preserves important quantitative information (such as the length of a hilt) while also presenting additional qualitative information (the presence or absence of a pommel, the shape of the crossguard).

Sometimes self-encoding is just for fun. Consider these two classics of the genre:



(Original sources lost to the mists of the Internet, but found here and here.)

There is a certain “can’t-argue-with-that” charm to self encoding. They read like visual tautologies: the part of the chart that looks like a pyramid is encoded by the part of the chart that looks like a pyramid, the remaining pie is encoded by the remaining pie, the length of the blade is encoded by the length of the blade.

I will freely admit that my graph was a stab at comedy, rather than an attempt to communicate information about the British Museum’s collection of eighteenth-century swords. But self-encoding can be useful beyond the humor of unexpected juxtapositions. Take the self-encoded graphic of the lifecycle of a Japanese beetle:

Asset 1.png

(Originally printed in Man and Insects by L. Hugh Newman ,scanned from The Visual Display of Quantitative Information by Edward Tufte.)

The beetle’s position underground throughout the year is represented as, well, the beetle’s position underground. However, by portraying the beetle itself rather than a more abstract dot, line, or bar, the graphic communicates the creature’s size, positioning, and development throughout the year.

Self-encoded charts are closely related to diagrams: both communicate qualitative details while illustrating an organism or item. However, self-encoding goes a step further by arranging images in a way that facilitates data visualization tasks like comparison or the detection of patterns and outliers.

Self-encoded charts also have a surface resemblance to pictographs, but they take matters a step further. Take the following (entirely fictional) pictograph:

pictographAsset 1.png

The point of the pictograph is not that each potentially-sworded person has two arms and two legs and a mysterious floating head. These details make the icons instantly recognizable as people, but they’re superfluous to the quantity being shown. In a self-encoding chart, the details are the information being shown. In contrast, self-encoded images communicate some quality beyond quantity. One of the upsides of self-encoding is the ability to examine details that haven’t been directly measured. For instance, check out this 1864 diagram of river length, encoded by the actual rivers:


(Originally printed in Johnson’s New Illustrated Family Atlas with Physical Geography by Joseph Hutchins Colton)

The St. Lawrence River (the one with all the lakes) and the Niger River (directly to the right of the one with all the lakes) are very similar in length, but could not be more different in terms of intersection with other bodies of water. The main piece of information communicated by this chart is river length, but self-encoding also reveals river shape, tributaries, and settlements along the way.

Self-encoding for humor is inherently limited: it works when classical graphical elements are repurposed by encoding an image’s area, length, or position as area, length, or position. However, some of the benefits of self-encoding, such as quick recognition and intuitive understanding, can be recreated in surprising and serious contexts. In this diagram of increasing political polarization, the ideological distance between American political parties is shown as actual distance:

Screen Shot 2018-10-29 at 2.06.35 PM

(From “The Rise of Partisanship and Super-Cooperators in the U.S. House of Representatives” by Andis et al.)

This isn’t the plain-spoken humor of the sword graph: partisanship is a complex measure, subject to all kinds of transformation between observation and visualization. But the graphic makes instant intuitive sense by linking an abstract measure of distance with literal distance on the page, and showing the transition from muddy-colored cooperation into pure hues as the parties retreated further into ideological purity.

Self-encoding can dramatically increase a graphic’s information density in cases where one mark represents one sword (or one stage of a beetle’s life cycle, or one river, or one segment of actual pie). However, self-encoding also enforces a sort of information un-density. The technique is useful because it adds qualitative details. On the other hand, it is only useful when details are visible and recognizable, and therefore not suitable for trying to show a large quantity of data points.

Even when it is usable, self-encoding isn’t always appropriate as a tool. In the river example above, the kinks and turns of the rivers obscure their true lengths: someone who wanted to know precisely how long the Niger River is would have to turn elsewhere. The sword graph also only works because I could pick and choose between eighteenth-century blades. If I needed to include this curved sword, for instance, I couldn’t compare blade lengths by slapping a picture of it next to the straight-bladed swords. Like all visualizations techniques, self-encoding is useful for specific tasks with specific audiences.

Self-encoding also requires some careful choices around imagery: what is communicated when an image is simplified down to its most iconic form, versus when it is shown in photorealistic detail? Sword Graph: Part II will explore those choices by taking a dive into visual perception and comic books.

Having acknowledged the weaknesses of self-encoding, I can now acknowledge that I am completely charmed by it. Watch this space for more illustrations in unexpected places. And if you’re interested in the British Museum’s eighteenth-century swords (they’re all real, even the wiggly one!), you can find them here.

Interaction Without Interactivity

Last week I sat in on a guest lecture by Xaquín G.V., a visual editor at the New York Times. He showed a variety of interactive projects rich in hooks. One article from his time at the Guardian asked readers to create a stable coalition government by dragging and dropping political parties. Another interactive was a surprise at the end of an article about the gender pay gap, showing how much more money a man would have made than a woman in the time since the page was opened.

Screen Shot 2018-10-15 at 11.12.27 PM

Hook is an accurate term: as a reader, I immediately wanted to play with these visualizations. As a designer, I immediately wanted to make interactives like them. Unfortunately, I haven’t learned how to build interactive visualizations yet. So I started to wonder: how can I achieve a similar effect in static visualizations?

Continue reading “Interaction Without Interactivity”

Time and Space

I’m up to my ears in student loan data at the moment—not my own this time, thank God—and trying my hand at the peculiar alchemy of data visualization. A group of loans becomes a list of numbers, becomes an aggregation, becomes an angle or a color or a position in space. Encoding turns a thousand bills at a thousand kitchen tables into a digestible summary.

This week, I’ve also been thinking about how we encode attention: trading in space and time to communicate when an audience should stop and think. Take this graphic, part of a New York Times feature on the survivors of the Las Vegas massacre:


Here, space breaks one number down to its components: not to transform them in some way, or to compare between them, but to convey that there are individuals within an aggregation. The print version of the story traded inches of column space for individual figures of each victim:


The graphic doesn’t communicate any information beyond the labels: 456 injured, 413 shot, 58 killed. Instead it creates a space for reflection on the individuals within those numbers. The graphic isn’t space-efficient, because efficiency isn’t the point.

The digital version of the story offers a similar experience through a different medium. Halfway down the page, the article gives way to individual sentences on a white background:

Screen Shot 2018-10-08 at 6.42.13 PM Screen Shot 2018-10-08 at 6.40.08 PM

Screen Shot 2018-10-08 at 6.40.16 PM Screen Shot 2018-10-08 at 6.42.30 PM

Readers have to scroll through counts of victims in order to continue reading the story. It’s an enforced meditation on violence, in a medium where the limiting factor is not space but time and attention.

Tamara Munzner writes about designing data visualizations for tasks: consuming or producing information through discovery, presentation, comparison, summary, and so on. Perhaps reflection belongs on that list as well. If the task of a visualization is conveying a human toll, maybe the right method of encoding is making space to think. Sometimes the mic drop of a story isn’t found at the end of a long analysis, but what the audience knew without knowing.

Richard Johnson took this idea to a massive scale with a depiction of lives lost in the Syrian civil war. This image of a tattered but flying Syrian flag is made up of 220,000 individual dots, one for each civilian death in the war.

Screen Shot 2018-10-08 at 5.47.20 PM

Hue is often to communicate categories. In this visualization, though, it creates a unified image to communicate shared identity and loss.

The full version of the graphic on the Washington Post’s website has a scroll-over feature, which zooms in on the dots in different parts of the graphic. The zoom is an arresting feature: I know exactly what I’m going to see when I zoom in, but I look anyways. It’s a digital piece—no one painted 220,000 individual dots—but there is a sense that the effort would have been worthwhile.

Sometimes space isn’t used to communicate one piece of information or to break up a unified figure, but to communicate an absence. The New York Times published two examples recently. The first was part of a special segment on the 10th anniversary of the 2008 recession:

Screen Shot 2018-10-08 at 6.47.04 PM

After the Parkland shooting, the New York Times created a calendar view of the days since Sandy Hook, with annotations about the mass shootings in between. A legend at the top of the page indicates that new gun control legislation will be marked in red. There is a single red square in March 2018, where the visualization notes, “As part of omnibus spending package, Trump signs a bill to improve record reporting for the existing background check system.” It’s a bleak record of inaction communicated in a sea of gray.

Screen Shot 2018-10-08 at 6.48.48 PM

A lot of information could have been conveyed on that page, or in place of those empty calendar months. But a blank graphic isn’t empty: it’s a refusal to let nothingness be neutral. It’s a pointed question about why and how an absence came to be, and a demand that the audience grapple with what could have been.

Visualization design is obsessed with efficiency, perpetually asking how to communicate more with less. These visualizations deliberately eschew efficiency. Instead, they spend their ink in consideration of individuals and absence. They invite (and sometimes demand) that we absorb instead of analyze. Position, area, angle, and saturation can encode numerical values. Time and space encode values of a different kind.

The Why Axis: Searching beneath the streetlight

A drunk is carefully searching the ground beneath a streetlamp at night. A passerby asks what they’re looking for, and the drunk says, their keys. The passerby asks if the drunk lost their keys there, and the drunk says, “No, but this is where the light is.”

The streetlight effect (or streetlight problem, or drunkard’s search) is a common problem when working with data. What we really want to know hasn’t already been measured, or is very difficult to measure, or can’t be measured at all, so we substitute an easily accessible dataset to try to achieve the same goals. Is our answer in that dataset? No, but that’s what’s on Github.

I’m wrapping up my read of Picturing the Uncertain World with an exploration of the streetlight effect. The examples I’m pulling from the book are based on semi-serious asides, not genuine policy proposals, but I think they illustrate the streetlight effect and some associated dangers.

The passerby in the story has one big question for the drunk: is he looking in the right place for his keys? However, I think there are three questions here for a data analyst to consider: are we looking in the right place for the keys, are the keys what we should be looking for, and what happens when we find them?

Continue reading “The Why Axis: Searching beneath the streetlight”

The Why Axis: small sample sizes and too many slopes

I’m working my way through Picturing the Uncertain World by Howard Wainer, a collection of articles about dealing with uncertainty in statistical thinking and visualization. I could (and might!) write an essay about every article in the book. In this first post, I want to pull out two points that might be useful when analyzing or writing about data.

Picturing the Uncertain World is a series of real-world case studies, often with deeply-felt consequences. I’m collapsing a few articles together here, so to illustrate these points I’m going to use a fictional example with absolutely no consequences: swords in the made-up country of Knightlandia. Look for the bolded text if you’re only interested in the bottom line.

Continue reading “The Why Axis: small sample sizes and too many slopes”