Awhile ago, science-fiction writer Ted Chiang described Chatgpt’s text output as a ‘blurry jpeg of all the text in the web’—or: as a semantic ‘poor image’.footnote1 But the blurry output generated by machine-learning networks has an additional historical dimension: statistics. Visuals created by ml tools are statistical renderings, rather than images of actually existing objects. They shift the focus from photographic indexicality to stochastic discrimination. They no longer refer to facticity, let alone truth, but to probability. The shock of sudden photographic illumination is replaced by the drag of Bell curves, loss functions and long tails, spun up by a relentless bureaucracy.

These renderings represent averaged versions of mass online booty, hijacked by dragnets, in the style of Francis Galton’s blurred eugenicist composites, 8k, Unreal engine. As data visualizations, they do not require any indexical reference to their object. They are not dependent on the actual impact of photons on a sensor, or on emulsion. They converge around the average, the median; hallucinated mediocrity. They represent the norm by signalling the mean. They replace likenesses with likelinesses. They may be ‘poor images’ in terms of resolution, but in style and substance they are: mean images.

An example of how a set of more traditional photographs is converted into a statistical render: the search engine, ‘Have I been trained?’—a very helpful tool developed by the artists Mat Dryhurst and Holly Herndon—allows the user to browse the massive laion-5b dataset used to train Stable Diffusion, one of the most popular deep-learning text-to-image generators. These pictures of mine (Figure 1) show up inside this training data. What does Stable Diffusion make of them? Ask the model to render ‘an image of hito steyerl’, and this (Figure 2) is the result.

Assemblage of photos of Hito Steyerl's face with text and computer-generated contexts
Image of an elderly asian-looking woman's face

So, how did Stable Diffusion get from A to B? It is not the most flattering ‘before and after’ juxtaposition, for sure; I would not recommend the treatment. It looks rather mean, or even demeaning; but this is precisely the point. The question is, what mean? Whose mean? Which one? Stable Diffusion renders this portrait of me in a state of frozen age range, produced by internal, unknown processes, spuriously related to the training data. It is not a ‘black box’ algorithm that is to blame, as Stable Diffusion’s actual code is known. Instead, we might call it a white box algorithm, or a social filter. This is an approximation of how society, through a filter of average internet garbage, sees me. All it takes is to remove the noise of reality from my photos and extract the social signal instead; the result is a ‘mean image’, a rendition of correlated averages—or: different shades of mean.

The English word ‘mean’ has several meanings, all of which apply here. ‘Mean’ may refer to minor or shabby origins, to the norm, to the stingy or to nastiness. It is connected to meaning as signifying, to ideas of the common, but also to financial or instrumental means. The term itself is a composite, which blurs and superimposes seemingly incompatible layers of signification. It bakes moral, statistical, financial and aesthetic values as well as common and lower-class positions into one dimly compressed setting. Mean images are far from random hallucinations. They are predictable products of data populism. They pick up on latent social patterns that encode conflicting significations as vector coordinates. They visualize real existing social attitudes that align the common with lower-class status, mediocrity and nasty behaviour. They are after-images, burnt into screens and retinas long after their source has been erased. They perform a psychoanalysis without either psyche or analysis for an age of automation in which production is augmented by wholesale fabrication. Mean images are social dreams without sleep, processing society’s irrational functions to their logical conclusions. They are documentary expressions of society’s views of itself, seized through the chaotic capture and large-scale kidnapping of data. And they rely on vast infrastructures of polluting hardware and menial and disenfranchised labour, exploiting political conflict as a resource.

When a text-to-3d tool called Dreamfusion was trialled in autumn 2022, users began to spot an interesting flaw. The ml-generated 3d models often had multiple faces, pointing in different directions (Figure 3). This glitch was dubbed the Janus problem.footnote2 What was its cause? One possible answer is that there is an over-emphasis on faces in machine-learning image recognition and analysis; training data has relatively more faces than other body parts. The two faces of Janus, Roman god of beginnings and endings, face towards the past and towards the future; he is also the god of war and peace, of transition from one social state to another.

Squirrel image with multiple faces

The machine-learning Janus problem touches on a crucial issue—the relation between the individual and the multitude. How to portray the crowd as one? Or conversely the one as crowd, as collective, group, class or Leviathan? What is the relation between the individual and the group, between private and common interests (and property), especially in an era in which statistical renders are averaged group compositions?