Hidden In Plain Sight

Franco Moretti; Oleg Sobchuk

If there is one feature that immediately distinguishes the digital humanities (dh) from the ‘other’ humanities, data visualization has to be it. Histograms, scatterplots, time series, diagrams, networks . . . ten, fifteen years ago, studies of film, music, literature or art didn’t use any of these. Now they do, and here we examine some premises (unspoken, and often probably unconscious) of this field-defining practice. Field-defining, because visualization is never just visualization: it involves the formation of corpora, the definition of data, their elaboration, and often some sort of preliminary interpretation as well. Whence the idea of this article: to gather sixty-odd studies that have had a significant impact on dh, and analyse how they visually present their data.footnote¹ What interests us is visualization as a practice, in the conviction that practices—what we learn to do by doing, by professional habit, without being fully aware of what we are doing—often have larger theoretical implications than theoretical statements themselves. Whether this has indeed been the case for dh, is for readers to decide.footnote²

We begin with the article that announced the creation of Google Ngrams, thus catapulting the digital-quantitative approach into the open, well beyond the boundaries of a small academic niche: ‘Quantitative Analysis of Culture Using Millions of Digitized Books’, published in Science in January 2011. Figure 1, opposite, is the first image one encounters in the article, and it sets the tone for all that follows: the horizontal axis measures the passage of time; the vertical one, the frequency of the word ‘slavery’. A time series, as this type of chart is usually called: the years pass, and the frequency of ‘slavery’ changes; it doubles around the Civil War, it slowly declines to its initial frequency, it rises again, more modestly, at the time of the civil rights movement, and so on. ‘Quantitative Analysis of Culture’ includes 33 charts, and 27 of them—80 per cent—are of this kind.

Though 80 per cent is high for our corpus, time series are unquestionably very common in dh work, and have thus become its visual ‘signature’.footnote³ Simplicity, as Rosenberg and Grafton suggest, has certainly helped. Just two elements: history and semantics. One word (Figure 1), two (Figure 2), four (Figure 3), or hundreds of them, as in the ‘semantic fields’ and ‘topics’ of Figures 4 and 5. The numbers change, and so do the objects under investigation (books, newspaper articles, World Bank reports, novels, scholarly studies); what doesn’t change is the focus on content. ‘Topic modelling’; ‘content analysis’; ‘text mining’: meaning is like a raw material, unaffected by textual organization. Corpora are ‘bags of words’, as the saying has it; meaning must be extracted—text mining—and that’s it: once out, it’s perfectly explicit: ‘Changes in discourse reveal broader historical and sociocultural changes . . .’; ‘The models . . . reveal a strong decline of positive emotionality through time . . .’; ‘This approach reveals important but hitherto unarticulated trends’. Language reveals; it never hides, or lies, or complicates matters. It’s an idea of culture, as the triumph of the explicit.

More on this later. Now, shifting from the vertical to the horizontal axis, it’s striking how often these time series extend over a historical span of exactly a century. The novels and scholarly articles of Figures 4 and 5, the lexicon of property in parliamentary debates (Figure 6), bestsellers written by women (Figure 7), shot length in film (Figure 8), the expression of emotions in fiction (Figure 9), contractions in American novels (Figure 10), repetitions in the canon and the archive (Figure 11), reviews of poetry collections (Figure 12) . . . Topic after topic, the century has emerged as the typical yardstick of quantitative cultural history.

In a few cases, it’s a matter of external constraints: film has existed for about a hundred years, and there is nothing one can do about that; books published between 1800 and 1900 allow good optical recognition (unlike earlier ones), while being free from copyright restrictions (unlike later ones)—whence our typical over-production of nineteenth-century studies. But the deeper reason for this predominance of the century has probably to do with dh’s claim to be ‘a way of discovering and interpreting patterns on a different historical scale’ (‘The Quiet Transformations of Literary Studies’). Different from previous research, often limited to a narrow historical span. But different how, exactly?

In searching for an answer, the century offered itself as an option so obvious, it went almost without saying. Intuitively, a century is long; it’s not anthropomorphic (as we seldom live that long), and is therefore extraneous to the old focus on individual authors; it is the tempo of the world, not of life. Romance languages use it for informal periodization (El Siglo de Oro, une dix-neuvièmiste, il Quattrocento); American universities, for many of their hires. The notion was there, in the existing doxa; a nice round number, long enough to suggest a new dimension, but not so long as to be unmanageable. True, it wasn’t really a concept; it had no place, for instance, in the tripartition of historical time—longue durée, cycle, event—elaborated by the Annales school; and it certainly wasn’t adopted as a result of a theoretical decision. But practice trumped theory, once again: the century offered an intuitive frame for the new scale we were after, and we turned it into the pedestal—the horizontal axis—for our historical findings.

‘A different historical scale’: specifically, one in which trends become visible. Up to a few years ago, no one spoke of trends in the humanities; you couldn’t, as long as you studied only a few texts, spanning a handful of years. With centuries, you can. New fields need keywords, and ‘trend’— with its mix of direction and measurement—was perfect for dh; not by accident, it showed up right away, in the abstract of that 2011 article in Science (‘Analysis of this corpus enables us to investigate cultural trends . . .’), and returned as a sort of opening chord in the first page of studies on film shots, jazz evolution, literary scholarship, British periodicals, poetry reviews, legal records, and expressions of emotion; in a single article (‘Quantitative Literary History of 2,958 Novels’), it occurs sixty-eight times. And it’s not just a word: in Figures 4, 8, 9, 10, 11, 12 and 13 trends are physically present in the form of regression lines and analogous elaborations of the data. We haven’t just talked of trends; we have made them visible, and given them pride of place.

Back to issue

Hidden In Plain Sight

Share

Back to issue

Share

By these authors