If there is one feature that immediately distinguishes the digital humanities (dh) from the ‘other’ humanities, data visualization has to be it. Histograms, scatterplots, time series, diagrams, networks . . . ten, fifteen years ago, studies of film, music, literature or art didn’t use any of these. Now they do, and here we examine some premises (unspoken, and often probably unconscious) of this field-defining practice. Field-defining, because visualization is never just visualization: it involves the formation of corpora, the definition of data, their elaboration, and often some sort of preliminary interpretation as well. Whence the idea of this article: to gather sixty-odd studies that have had a significant impact on dh, and analyse how they visually present their data.footnote1 What interests us is visualization as a practice, in the conviction that practices—what we learn to do by doing, by professional habit, without being fully aware of what we are doing—often have larger theoretical implications than theoretical statements themselves. Whether this has indeed been the case for dh, is for readers to decide.footnote2

We begin with the article that announced the creation of Google Ngrams, thus catapulting the digital-quantitative approach into the open, well beyond the boundaries of a small academic niche: ‘Quantitative Analysis of Culture Using Millions of Digitized Books’, published in Science in January 2011. Figure 1, opposite, is the first image one encounters in the article, and it sets the tone for all that follows: the horizontal axis measures the passage of time; the vertical one, the frequency of the word ‘slavery’. A time series, as this type of chart is usually called: the years pass, and the frequency of ‘slavery’ changes; it doubles around the Civil War, it slowly declines to its initial frequency, it rises again, more modestly, at the time of the civil rights movement, and so on. ‘Quantitative Analysis of Culture’ includes 33 charts, and 27 of them—80 per cent—are of this kind.

Line chart indicating quantitative analysis of culture using millions of digitized books, from 1800 to 2000. Measured against a y axis that indicates frequency of the word 'slavery'. An upwards spike between 1855 and 1875. A quote from Daniel Rosenberg and Anthony Grafton's 'Cartographies of Time', "It was not until the middle of the eighteenth century that a common visual vocabulary for time maps caught on. But the new linear formats of the eighteenth century were so quickly accepted that, within decades, it was hard to remember a time when they were not already in use. The key problem in chronographies, it turned out, was ... how to create a visual scheme to clearly communicate the uniformity, directionalist, and irreversibility of historical time."

Though 80 per cent is high for our corpus, time series are unquestionably very common in dh work, and have thus become its visual ‘signature’.footnote3 Simplicity, as Rosenberg and Grafton suggest, has certainly helped. Just two elements: history and semantics. One word (Figure 1), two (Figure 2), four (Figure 3), or hundreds of them, as in the ‘semantic fields’ and ‘topics’ of Figures 4 and 5. The numbers change, and so do the objects under investigation (books, newspaper articles, World Bank reports, novels, scholarly studies); what doesn’t change is the focus on content. ‘Topic modelling’; ‘content analysis’; ‘text mining’: meaning is like a raw material, unaffected by textual organization. Corpora are ‘bags of words’, as the saying has it; meaning must be extracted—text mining—and that’s it: once out, it’s perfectly explicit: ‘Changes in discourse reveal broader historical and sociocultural changes . . .’; ‘The models . . . reveal a strong decline of positive emotionality through time . . .’; ‘This approach reveals important but hitherto unarticulated trends’. Language reveals; it never hides, or lies, or complicates matters. It’s an idea of culture, as the triumph of the explicit.

Line chart labelled 'content analysis of 150 years of British periodicals' that indicates frequency of the terms 'revolt' and 'unrest' from 1800 to 1950. 'Revolt' remains in use the entire time, and 'unrest' emerges roughly around 1875. There is a spike of both between roughly 1905 and 1925.
Line chart indicating a content analysis of 150 years of British periodicals, from 1800 to 1950, measuring the frequency of the terms 'revolt' and 'unrest'. Revolt is present since 1800, and unrest emerges and increases after roughly 1875. Both terms spike between 1905 and 1925.

More on this later. Now, shifting from the vertical to the horizontal axis, it’s striking how often these time series extend over a historical span of exactly a century. The novels and scholarly articles of Figures 4 and 5, the lexicon of property in parliamentary debates (Figure 6), bestsellers written by women (Figure 7), shot length in film (Figure 8), the expression of emotions in fiction (Figure 9), contractions in American novels (Figure 10), repetitions in the canon and the archive (Figure 11), reviews of poetry collections (Figure 12) . . . Topic after topic, the century has emerged as the typical yardstick of quantitative cultural history.

Line chart labelled 'content analysis of 150 years of British periodicals' that indicates frequency of the terms 'revolt' and 'unrest' from 1800 to 1950. 'Revolt' remains in use the entire time, and 'unrest' emerges roughly around 1875. There is a spike of both between roughly 1905 and 1925.
Three sets of figures. The first shows 2 scatter plots, labelled 'birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction'. The first chart measures the percentage of negative emotional-related terms from 1900 to 2000, and the second indicates the percentage of positive emotional-related terms from 1900-2000. The negative chart shows a trend of high frequency, slightly increasing, and the positive chart shows a trend of swift decline.   The second figure is another two scatter plots labelled 'the making of Middle American style', one chart indicating American and one British standard contradictions in dialogue versus narration, from 1820 to 1910. In both the narrative median line stays low and flat, and the dialogue line increases, the American more than the British.   The third figure is a scatter plot labelled ‘canon/archive: large scale dynamics in the literary field’, and measures term repetition from 1795 to 1905.
Scatter plot indicating the predicted probability of a poetry review coming from a reviewed set between 1812 and 1925, labelled 'the Longue Durée of Literary Prestige'. A median line runs through the middle, with a general trend of increase.

In a few cases, it’s a matter of external constraints: film has existed for about a hundred years, and there is nothing one can do about that; books published between 1800 and 1900 allow good optical recognition (unlike earlier ones), while being free from copyright restrictions (unlike later ones)—whence our typical over-production of nineteenth-century studies. But the deeper reason for this predominance of the century has probably to do with dh’s claim to be ‘a way of discovering and interpreting patterns on a different historical scale’ (‘The Quiet Transformations of Literary Studies’). Different from previous research, often limited to a narrow historical span. But different how, exactly?

In searching for an answer, the century offered itself as an option so obvious, it went almost without saying. Intuitively, a century is long; it’s not anthropomorphic (as we seldom live that long), and is therefore extraneous to the old focus on individual authors; it is the tempo of the world, not of life. Romance languages use it for informal periodization (El Siglo de Oro, une dix-neuvièmiste, il Quattrocento); American universities, for many of their hires. The notion was there, in the existing doxa; a nice round number, long enough to suggest a new dimension, but not so long as to be unmanageable. True, it wasn’t really a concept; it had no place, for instance, in the tripartition of historical time—longue durée, cycle, event—elaborated by the Annales school; and it certainly wasn’t adopted as a result of a theoretical decision. But practice trumped theory, once again: the century offered an intuitive frame for the new scale we were after, and we turned it into the pedestal—the horizontal axis—for our historical findings.

‘A different historical scale’: specifically, one in which trends become visible. Up to a few years ago, no one spoke of trends in the humanities; you couldn’t, as long as you studied only a few texts, spanning a handful of years. With centuries, you can. New fields need keywords, and ‘trend’— with its mix of direction and measurement—was perfect for dh; not by accident, it showed up right away, in the abstract of that 2011 article in Science (‘Analysis of this corpus enables us to investigate cultural trends . . .’), and returned as a sort of opening chord in the first page of studies on film shots, jazz evolution, literary scholarship, British periodicals, poetry reviews, legal records, and expressions of emotion; in a single article (‘Quantitative Literary History of 2,958 Novels’), it occurs sixty-eight times. And it’s not just a word: in Figures 4, 8, 9, 10, 11, 12 and 13 trends are physically present in the form of regression lines and analogous elaborations of the data. We haven’t just talked of trends; we have made them visible, and given them pride of place.