If there is one feature that immediately distinguishes the digital humanities (dh) from the ‘other’ humanities, data visualization has to be it. Histograms, scatterplots, time series, diagrams, networks . . . ten, fifteen years ago, studies of film, music, literature or art didn’t use any of these. Now they do, and here we examine some premises (unspoken, and often probably unconscious) of this field-defining practice. Field-defining, because visualization is never just visualization: it involves the formation of corpora, the definition of data, their elaboration, and often some sort of preliminary interpretation as well. Whence the idea of this article: to gather sixty-odd studies that have had a significant impact on dh, and analyse how they visually present their data.footnote1 What interests us is visualization as a practice, in the conviction that practices—what we learn to do by doing, by professional habit, without being fully aware of what we are doing—often have larger theoretical implications than theoretical statements themselves. Whether this has indeed been the case for dh, is for readers to decide.footnote2
We begin with the article that announced the creation of Google Ngrams, thus catapulting the digital-quantitative approach into the open, well beyond the boundaries of a small academic niche: ‘Quantitative Analysis of Culture Using Millions of Digitized Books’, published in Science in January 2011. Figure 1, opposite, is the first image one encounters in the article, and it sets the tone for all that follows: the horizontal axis measures the passage of time; the vertical one, the frequency of the word ‘slavery’. A time series, as this type of chart is usually called: the years pass, and the frequency of ‘slavery’ changes; it doubles around the Civil War, it slowly declines to its initial frequency, it rises again, more modestly, at the time of the civil rights movement, and so on. ‘Quantitative Analysis of Culture’ includes 33 charts, and 27 of them—80 per cent—are of this kind.
Though 80 per cent is high for our corpus, time series are unquestionably very common in dh work, and have thus become its visual ‘signature’.footnote3 Simplicity, as Rosenberg and Grafton suggest, has certainly helped. Just two elements: history and semantics. One word (Figure 1), two (Figure 2), four (Figure 3), or hundreds of them, as in the ‘semantic fields’ and ‘topics’ of Figures 4 and 5. The numbers change, and so do the objects under investigation (books, newspaper articles, World Bank reports, novels, scholarly studies); what doesn’t change is the focus on content. ‘Topic modelling’; ‘content analysis’; ‘text mining’: meaning is like a raw material, unaffected by textual organization. Corpora are ‘bags of words’, as the saying has it; meaning must be extracted—text mining—and that’s it: once out, it’s perfectly explicit: ‘Changes in discourse reveal broader historical and sociocultural changes . . .’; ‘The models . . . reveal a strong decline of positive emotionality through time . . .’; ‘This approach reveals important but hitherto unarticulated trends’. Language reveals; it never hides, or lies, or complicates matters. It’s an idea of culture, as the triumph of the explicit.
More on this later. Now, shifting from the vertical to the horizontal axis, it’s striking how often these time series extend over a historical span of exactly a century. The novels and scholarly articles of Figures 4 and 5, the lexicon of property in parliamentary debates (Figure 6), bestsellers written by women (Figure 7), shot length in film (Figure 8), the expression of emotions in fiction (Figure 9), contractions in American novels (Figure 10), repetitions in the canon and the archive (Figure 11), reviews of poetry collections (Figure 12) . . . Topic after topic, the century has emerged as the typical yardstick of quantitative cultural history.
In a few cases, it’s a matter of external constraints: film has existed for about a hundred years, and there is nothing one can do about that; books published between 1800 and 1900 allow good optical recognition (unlike earlier ones), while being free from copyright restrictions (unlike later ones)—whence our typical over-production of nineteenth-century studies. But the deeper reason for this predominance of the century has probably to do with dh’s claim to be ‘a way of discovering and interpreting patterns on a different historical scale’ (‘The Quiet Transformations of Literary Studies’). Different from previous research, often limited to a narrow historical span. But different how, exactly?