Emulations of photography generated by artificial intelligence are beginning to look so convincing that they can no longer be distinguished from images made by a film camera.footnote1 This is a new development and the same cannot yet be said for video; we are, nevertheless, at an intermediate inflection point when ai ‘photographs’ can, as it were, when skilfully and specifically prompted, pass the Turing test. What are the consequences for our visual culture?

In recent essays, artist and theorist Hito Steyerl has explored the character of ai image generation and the bids by would-be makers of Artificial General Intelligence to capture the ‘general intellect’—and even hegemonic common sense, as Gramsci termed it, with all its virtues, faults and contradictions—thus establishing the ultimate monopoly.footnote2 The effort towards total capture is explicit, for instance, in the assembly by ImageNet of a hierarchically ordered universal map of objects that can be identified by ai.footnote3 Steyerl’s description of such striving for dominance rightly highlights data-mining, privatization, exploitative conditions of labour, and invidious attempts to identify and classify individuals by race. It may be that a look at the interrelation of ai and photography can reveal more about the character of this nascent hegemony and its relation to commercial culture.

Two connected concepts—entropy and déjà vu—may be of use here. I’m using ‘entropy’ in an information-science sense, which doesn’t entirely coincide with an intuitive understanding of its meaning in the Second Law of Thermodynamics—that is: an increase of disorder, a loss of complexity and structure, as when heat is generated from matter, crystalline coal turning to mere ash. In the foundational 1940s work of the American mathematicians and information-science theorists Claude Shannon and Warren Weaver, trying to solve the problem of separating a signal from interfering noise in communication systems, ‘entropy’ refers to a high level of information.footnote4 In calculating how much data could be transmitted along a channel while still remaining distinguishable from noise, Shannon had the insight that any message is a choice taken from a field determined by its symbols, and that information, randomness and complexity were aligned.footnote5 In this sense, an entirely predictable data sequence—abababab, etc. or a chequerboard pattern—has very low entropy, and thus carries very little information since we soon know what is coming next. A sequence of maximum entropy, by contrast, carries so much information that it is unreadable, lacking the structure and redundancy necessary to distinguish its message from the random noise introduced during its transmission.

But the concept of low entropy can equally be applied to cultural predictability and cliché. When the hero of a Hollywood movie is shown lying on the ground, apparently dead, with people gathered around him in distress, viewers will anticipate that a miraculous resurrection is about to take place. In the cultural field, capitalism has long encouraged producers to make conformist works in standard formats; successful models are copied with minor variations, franchises churn out predictable products and tv tends to turn everything into soap. Algorithms are now being used to test pop songs and, increasingly, to help write them. Such general standardization was one of the main themes of Donald Sassoon’s monumental examination of European cultural markets, from novels to operas, cinema to comic strips.footnote6 Surveying the uniformity of content and tone across hundreds of American tv channels in the 1990s, Bill McKibben called it the ‘pleasant tract-housing development of the mind’.footnote7 This is what low entropy feels like.

‘Photographs’ taken by phone cameras are already extensively governed by ai processes, of course. The user’s choice of when to press the shutter marks only a mid-point in a burst of images, taken before and after, that are melded to make the resulting ‘photograph’, using hdr effects to increase tonal range and resolution, and to decrease ‘noise’, or lower entropy. In the early days of digital photography, the cultural theorist Lev Manovich compared the manipulation of its surfaces to the uncannily smoothed-out half-photography, half-painting portraits of Soviet luminaries.footnote8 A similar ‘de-noising’ effect can be seen in most phone-camera portraits. The raw images produced by the tiny sensors and (mostly) plastic lenses of phone cameras are processed by algorithms that recognize the generic subject—person, landscape, food—and tailor the images accordingly, adding sharpness, emulating differential focus, smoothing surfaces and increasing colour saturation. ai image-generation programmes are trained on online images; since the vast majority of these are taken in their billions on phone cameras and uploaded to social media sites, they have already been ai-enhanced.

The latest wave of ai is based partly on neural networks which emulate aspects of organic brains. As with our own minds, their functioning is opaque—often compounded by the secrecy that guards proprietary softwarefootnote9—and they are error-prone. They use many layers of processing, hierarchically organized, so that those layers closest to the input device (a camera, for instance) deal with the most basic procedures, such as edge detection, while increasingly nuanced matters are dealt with further back.footnote10 Diffusion models like dall-e, which generate images in response to verbal prompts, are generally trained by gradually adding ‘noise’ to the image of a defined subject in many small steps, until it reaches the end-point of total randomness, a visual white noise. The algorithm analyses each step and learns how to reverse the process—to move from noisy to defined images, one stage at a time.footnote11

While it is possible to use a de-noising process to, say, train an algorithm using a database of faces to generate plausible new faces from a random field of noise, the best results usually involve extensive human intervention. Diffusion models are guided using labels, classifiers, texts, target images, semantic maps and graphs. Since they are trained to predict what is likely across a vast database of photographic images, ais are indeed anti-entropy machines, removing ‘noise’ or complexity from the source material, smoothing surfaces and cultivating the clichéd. The resulting images look how most people think photography should look. The anti-entropic effect is plain both in the ai processes governing phone cameras and the programmes used to ‘improve’ existing photographs. When very new to photography, I took a rather incompetent picture of a friend. I sent a scan of it to him recently, and he ran it through the ai image-generation programme Leonardo.ai to improve it. The programme ironed out my errors in exposure and focusing, and cleaned up my friend’s clothes, making them look sharper and more fashionable. When the algorithmic filter was applied strongly, it made him look like a model, in the style of most digital avatars—whether the idealized products of Photoshop editing or a complete ai fabrication.