Memories of the Present

Julian Stallabrass

Photography and Artificial Intelligence

Emulations of photography generated by artificial intelligence are beginning to look so convincing that they can no longer be distinguished from images made by a film camera.footnote¹ This is a new development and the same cannot yet be said for video; we are, nevertheless, at an intermediate inflection point when ai ‘photographs’ can, as it were, when skilfully and specifically prompted, pass the Turing test. What are the consequences for our visual culture?

In recent essays, artist and theorist Hito Steyerl has explored the character of ai image generation and the bids by would-be makers of Artificial General Intelligence to capture the ‘general intellect’—and even hegemonic common sense, as Gramsci termed it, with all its virtues, faults and contradictions—thus establishing the ultimate monopoly.footnote² The effort towards total capture is explicit, for instance, in the assembly by ImageNet of a hierarchically ordered universal map of objects that can be identified by ai.footnote³ Steyerl’s description of such striving for dominance rightly highlights data-mining, privatization, exploitative conditions of labour, and invidious attempts to identify and classify individuals by race. It may be that a look at the interrelation of ai and photography can reveal more about the character of this nascent hegemony and its relation to commercial culture.

Two connected concepts—entropy and déjà vu—may be of use here. I’m using ‘entropy’ in an information-science sense, which doesn’t entirely coincide with an intuitive understanding of its meaning in the Second Law of Thermodynamics—that is: an increase of disorder, a loss of complexity and structure, as when heat is generated from matter, crystalline coal turning to mere ash. In the foundational 1940s work of the American mathematicians and information-science theorists Claude Shannon and Warren Weaver, trying to solve the problem of separating a signal from interfering noise in communication systems, ‘entropy’ refers to a high level of information.footnote⁴ In calculating how much data could be transmitted along a channel while still remaining distinguishable from noise, Shannon had the insight that any message is a choice taken from a field determined by its symbols, and that information, randomness and complexity were aligned.footnote⁵ In this sense, an entirely predictable data sequence—abababab, etc. or a chequerboard pattern—has very low entropy, and thus carries very little information since we soon know what is coming next. A sequence of maximum entropy, by contrast, carries so much information that it is unreadable, lacking the structure and redundancy necessary to distinguish its message from the random noise introduced during its transmission.

But the concept of low entropy can equally be applied to cultural predictability and cliché. When the hero of a Hollywood movie is shown lying on the ground, apparently dead, with people gathered around him in distress, viewers will anticipate that a miraculous resurrection is about to take place. In the cultural field, capitalism has long encouraged producers to make conformist works in standard formats; successful models are copied with minor variations, franchises churn out predictable products and tv tends to turn everything into soap. Algorithms are now being used to test pop songs and, increasingly, to help write them. Such general standardization was one of the main themes of Donald Sassoon’s monumental examination of European cultural markets, from novels to operas, cinema to comic strips.footnote⁶ Surveying the uniformity of content and tone across hundreds of American tv channels in the 1990s, Bill McKibben called it the ‘pleasant tract-housing development of the mind’.footnote⁷ This is what low entropy feels like.

More like a photo

‘Photographs’ taken by phone cameras are already extensively governed by ai processes, of course. The user’s choice of when to press the shutter marks only a mid-point in a burst of images, taken before and after, that are melded to make the resulting ‘photograph’, using hdr effects to increase tonal range and resolution, and to decrease ‘noise’, or lower entropy. In the early days of digital photography, the cultural theorist Lev Manovich compared the manipulation of its surfaces to the uncannily smoothed-out half-photography, half-painting portraits of Soviet luminaries.footnote⁸ A similar ‘de-noising’ effect can be seen in most phone-camera portraits. The raw images produced by the tiny sensors and (mostly) plastic lenses of phone cameras are processed by algorithms that recognize the generic subject—person, landscape, food—and tailor the images accordingly, adding sharpness, emulating differential focus, smoothing surfaces and increasing colour saturation. ai image-generation programmes are trained on online images; since the vast majority of these are taken in their billions on phone cameras and uploaded to social media sites, they have already been ai-enhanced.

The latest wave of ai is based partly on neural networks which emulate aspects of organic brains. As with our own minds, their functioning is opaque—often compounded by the secrecy that guards proprietary softwarefootnote⁹—and they are error-prone. They use many layers of processing, hierarchically organized, so that those layers closest to the input device (a camera, for instance) deal with the most basic procedures, such as edge detection, while increasingly nuanced matters are dealt with further back.footnote¹⁰ Diffusion models like dall-e, which generate images in response to verbal prompts, are generally trained by gradually adding ‘noise’ to the image of a defined subject in many small steps, until it reaches the end-point of total randomness, a visual white noise. The algorithm analyses each step and learns how to reverse the process—to move from noisy to defined images, one stage at a time.footnote¹¹

While it is possible to use a de-noising process to, say, train an algorithm using a database of faces to generate plausible new faces from a random field of noise, the best results usually involve extensive human intervention. Diffusion models are guided using labels, classifiers, texts, target images, semantic maps and graphs. Since they are trained to predict what is likely across a vast database of photographic images, ais are indeed anti-entropy machines, removing ‘noise’ or complexity from the source material, smoothing surfaces and cultivating the clichéd. The resulting images look how most people think photography should look. The anti-entropic effect is plain both in the ai processes governing phone cameras and the programmes used to ‘improve’ existing photographs. When very new to photography, I took a rather incompetent picture of a friend. I sent a scan of it to him recently, and he ran it through the ai image-generation programme Leonardo.ai to improve it. The programme ironed out my errors in exposure and focusing, and cleaned up my friend’s clothes, making them look sharper and more fashionable. When the algorithmic filter was applied strongly, it made him look like a model, in the style of most digital avatars—whether the idealized products of Photoshop editing or a complete ai fabrication.

As an example of these operations, let’s take the ‘prompt’ of Tish Murtha’s Thatcher-era photographs of people, especially adolescents and children, in working-class areas of the English north-east.footnote¹² Ask dall-e to generate an image in the style of Tish Murtha, and it will tend to produce something that looks more archetypally like a Tish Murtha than any of her photographs do—for however one might characterize her style and sensibility, she was radically open to the contingencies of her subjects in each individual scene. In the ai renderings, typical urchins meet typical terraced housing under typically gloomy monochrome lighting. Murtha’s complex compositions are replaced by predictable, often symmetrical arrangements. Emanuele Arielli has coined a telling phrase for the affect of such images, which are highly conventional in an exaggerated fashion and uncannily similar to what we already know: ‘computational mannerism’.footnote¹³

Black and white photo of a boy looking at the camera from the end of a poor-looking British terraced street — DALL-E prompt: generate a photographic image of a kid in a council estate in the north of England in the 1980s, in the style of Tish Murtha, 35mm lens, Kodak Tri-X, f8.

Black and white image of several children doing different things in front of an ill-maintained house — Tish Murtha, 13 Kenilworth Road, from the series *Youth Unemployment*, 1981. © Ella Murtha, all rights reserved.

To stay with street photography and contingency, below is a photograph that I took in 1995, and which seemed to me to say something about post-Thatcherite Britain. It may look a little like a low-resolution Jeff Wall but it was in fact a shot taken rapidly in passing on the Charing Cross Road. Contingency is ordered through the selection of a subject and its framing, but the entropy of the image remains high in that, beyond its main theme, it includes many incidental details—for instance, the strange appearance in the background of a blurry figure in a blue dress who looks towards the camera, and indeed the various blue elements that run across the image: the socks of the sleeping man, a cardigan, the Blackwells bag.

Photograph of a 1990s London street scene outside Blackwells bookshop, including a man sleeping on a bench and a stressed-looking businessman — Julian Stallabrass, *Charing Cross Road*, 1995

Prompted to produce photographic street scenes of Thatcherite Britain, dall-e comes up with images like the one below. It is black-and-white, obviously because images of depressed economic circumstances tend to be. Unemployment and social deprivation are visualized by people sitting about oddly on the pavement. There is a lot of smoking going on. Perhaps because such scenes also tend to be associated with a more distant past, some visual elements—the traffic, the clothing, the shop signs—seem to have been cast back to the 1950s or 60s. Limbs are at odd angles; some faces are badly rendered, others are too sharply etched. The image is familiar—a representation of a London street—and also, uncanny.

Black and white image of a London street scene with obvious AI-generation distortions — DALL-E prompt: a street scene in Thatcher’s Britain in the 1980s

If we are more specific in the prompting, asking dall-e to include some of the details from my photo, we get a set of images like the one overleaf. The anti-entropic process, while imperfect, is evident, with the homeless man and the businessman looking far more clichéd than they do in my image. Any hint of social tension has been discounted in favour of a cosy co-existence. The same prompt provides variants with some odd elements, such as cardboard boxes and books piled on the wet pavement (London pavements are often wet, of course). The point about rainy London is a mild example of the ai stereotyping that has caused much controversy: due to the material in its training data, it embodies social conventions that are made manifest in its image generation.footnote¹⁴ Marx saw that what is captured in the general intellect—as embodied in machines—is not just technical knowledge but forms of social relations, with all their contradictions of cooperation and exploitation. Flagrant airings of racial and gender prejudices in ai-generated media have received much attention, along with Google’s clumsy attempt to fix the problem by adding an arbitrary piece of code to introduce diversity, which produced images of Black and Asian Nazis. Yet the capturing of prejudice is of a piece with ai’s ambition to capture the general intellect—the one is indissociable from the other. In dealing with this issue, some critical texts edge towards the position that all classification is necessarily invidious; others, towards using the technology in artistic projects to undermine stereotypes—for example, Pilar Rosado and her collaborators make works that highlight the glitches of ai image-generation and turn it towards undermining conventional classifications.footnote¹⁵

Black and white image of 2 men in suits and a bearded man sitting on a bench in front of a bookshop. — DALL-E prompt: a photorealistic image of a homeless person, a businessman and a book-lover sharing the same bench outside a bookshop in London, 50mm lens, Kodachrome, f4

False recognition

All this conjuring with cliché may be related to the concept of cultural déjà vu. As an individual psychological phenomenon, déjà vu—‘seen already’—makes us feel that we are remembering an event that is just taking place, while simultaneously thinking that this sense of memory may be an illusion. This discrepancy accounts for the disturbing quality of the experience. Cultural déjà vu is related but distinct. The profit-driven imperative to copy with minor variations—the tract houses of the mind—already produces a pervasive climate of cultural déjà vu. It is no accident that the concept was first theorized towards the end of the nineteenth century, as various forms of mechanically reproduced mass media—photography and phonography—took hold.footnote¹⁶ This media-induced déjà vu involves a failure of source memory: one thinks one has seen something before, but cannot say where or how. In this case, however, that sense of familiarity may be no illusion—one probably has, maybe more than once. Even so, the inability to place the memory can be disturbing, producing that strange familiarity which is the defining quality of the uncanny.

In a complex philosophical meditation on déjà vu—written in 1999, just as the peak popularity of postmodern thought was starting to wane—the Italian post-Marxist Paolo Virno related the concept to the cynical attitude of those who put every phrase into quotation marks and live their lives as onlookers to their own existence, as if everything has been prescribed and has already occurred. In this dire state of endemic déjà vu, the vast array of past potential—all the things that might have happened or been done at any given moment—is regularly mistaken for actual past actions and events. Drawing upon Henri Bergson’s 1908 essay, ‘The Memory of the Present and False Recognition’, Virno’s account of déjà vu was situated at a moment when history appeared to have become stalled, exhausted, reduced to nostalgia and incapable of producing anything new. Despite or because of its relation to postmodernism, Virno’s account speaks to our present: those—like the habitués of social media—who watch themselves living are at once actors and spectators in a state of collective déjà vu, an experience reduced to ‘already-performed actions, already-spoken phrases and already-complete events’.footnote¹⁷

The writings of the Czech-Brazilian philosopher Vilém Flusser, often invoked in relation to photography and ai imaging, are also relevant here; particularly his curious—and curiously prescient—1985 study, Into the Universe of Technical Images.footnote¹⁸ Flusser thought of photography as a pre-existing field of possibilities, determined by the apparatus of the camera: when taking photographs, most people make images that fit squarely into conventional genres, thereby filling in the latent blanks in photography’s extensive field.footnote¹⁹ This view has parallels with the structuralist concept of the field of language, from which particular utterances or texts are chosen, and also resembles the actions of ai textual generators, stochastic parrots perhaps, as they plausibly echo and assemble likely phrases. Flusser’s argument could be revealing when applied to commercial imagery, stock photography and generic snapshots; it was also highly contentious, not least because photography kept changing, as rapidly and deeply as the world at which it pointed—indeed, each changed the other.footnote²⁰ Recent developments in image-making have afforded it a new salience, as a means of grappling with the flood of conformist photographic performances on social-media platforms. The field of ai imagery might seem to be a realization of Flusser’s view: an ai programme navigates the vast, multi-dimensional spaces of its training dataset, alighting on visual configurations both unprecedented and eerily familiar.

Uncanny glitches

Images generated by ai de-noising, especially when they emulate photography, produce three types of uncanny effect: there is the impression of déjà vu, or over-familiarity, combined with the sense of over-smoothness or over-cleanliness that comes with the reduction of entropy; there are socially bizarre juxtapositions, as with Google’s racially diverse Nazis; and finally there are outright glitches, especially marked in the rendering of faces and hands. Regarding such glitches, Manovich has suggested that what gets called ‘ai’ at any particular point is simply an unfamiliar technology. Once it settles into regular, reliable use, what was once known as ai drops out of that category, despite its computational character; one of his examples is the magic-wand selection tool in Photoshop.footnote²¹ ai image-generation is currently new, weird, evolving and often faulty in its de-noising, anti-entropic functioning. Even so, the general trajectory of the elimination of entropy, and with it of cultural non-conformity, is clear.footnote²²

What, after all, is ‘photography’ for ai image-generation? Certainly not a discrete medium, for of course it has nothing to do with photons falling on a receiving plate, chemical or digital. It is rather a set of ‘style’ parameters, which have been arrived at through statistical training on colossal databases about the kinds of images that cameras typically make. And, as Manovich notes, in an ai’s handling of such vast statistical fields, there is no clean separation of media, style and content.footnote²³ One way to understand the process is to think of it as the navigation of an extremely complex, multi-dimensional latent space of possible images. Since the 1990s, the artist William Latham has been using algorithms that simulate evolution to navigate a bounded but massive space of possibilities, directing it to create virtual creatures. Each characteristic of a creature—the number of legs, for example, or their colour—forms a dimension in the complex space the algorithm traverses. In the early days, Latham used his algorithms to find images in a 33-dimensional space, in which, if each variable could take on a hundred values, the possible number of combinations amounted to trillions of trillions.footnote²⁴ Together they generated forms from that vast field, guided but not fully determined by Latham’s aesthetic predilections, in a manner similar to the guidance of current de-noising algorithms.

1990s-style computer-generated image of a metallic-looking creature with many legs — William Latham, *ZapQ2 Raytraced on the Plane of Infinity*, 1993

Latham’s vast field of possibilities, only a few of which will be realized, parallels Virno’s potential history which, in the pathological state of collective déjà vu, is misrecognized as an actual past. We might also relate it to the concept of the commercially franchised multiverse, which allows an endless proliferation of parallel stories to be elaborated around different versions of its branded characters. As ais speedily generate images from their vast potential stock, users may gain a glimpse of that potential in the instances pulled from the multi-dimensional field. Latham’s latent space also recalls Flusser’s synchronic view of photography in Towards a Philosophy of Photography, and of the digital field in Into the Universe of Technical Images. The technical image, Flusser claims, is ‘a blindly realized possibility’ in a field of possibilities. In this connected and processed field, a globally networked ‘brain’, the pressing of a camera button is on principle no different from any other technical operation, such as a mouse click.footnote²⁵

Flusser was much concerned with what he saw as a tendency towards increasing entropy in digital culture, although he was using the term in the physical sense, to air his deep fear about the weakening of complexity in a cultural heat death (in the information-science sense, as we have seen, this is a lessening of entropy). The danger, he thought, was that commercial entities, aided by ai, would reduce the complexity of cultural messages, so that ‘images will always show the same thing, and people will always see the same thing’, and an ‘eternal, endless boredom will spread over society’. Flusser recommended that ai should automatically gauge the entropy of materials as they are uploaded, weeding out those that fail to meet some standard for complexity. If such a system were implemented now, a vast panoply of standardized photography—of food, pets, landscapes, selfies and pornography—would doubtless be culled. Flusser proposed that an idealist aristocracy of ‘envisioners’, a quasi-Nietzschean elite, should foster an alternative complex and humane visual culture.footnote²⁶

There is a parallel here with Susan Sontag’s famous call for an image ecology, to counter what she saw as the dangerous addiction to continual visual stimulation promoted by the mass media; and with Lewis Mumford’s dream of establishing a patrician culture of quality, to counter the vulgar mass of images.footnote²⁷ All these views carry a whiff of snobbery. And in each case, we might reasonably ask where the disinterested patricians who will cleanse the image flow will come from, given the profit-driven environment which produces and maintains the media and mass culture. So far it has efficiently produced the branded, hyper-exposed creators of social media who happily augment their efforts with ai generation.footnote²⁸

In Flusser’s emerging future, as seen from the 1980s, if the ‘envisioners’ are allowed their way, people across the world will be hooked into a rich, unifying, dialogic cultural feed, so absorbing that they will allow their bodies to become etiolated as they experience a gigantic and continuous mental orgasm.footnote²⁹ This vision, like much of Into the Universe of Technical Images, is eccentric, to put it kindly; but it does allow us to see that the current trajectory of ai is the antithesis of this ideal, being rather a colossal, evolving engine for the draining of entropy (in the informational-science sense) from the cultural sphere.

Revenants

Déjà vu seems amplified and made newly strange in this ai emulation of photography. After all, in the general run of the media world, its uncanny effects are muted by its very prevalence, as they have been in the acceptance of the manipulations of phone-camera photography. The novel unfamiliarity of ai imaging revives it, perhaps briefly. Adorno examined déjà vu in his ‘Notes on Kafka’, in passages that evoke though do not mention photography, remarking that it was an effect populated by ‘doubles, revenants, buffoons’—one long exploited photographically, of course, in the work of Diane Arbus and others—by children who suddenly appear ancient, or by people not fully alive; we may think here of August Sander, or the quasi-Surrealist photographs of shop dummies, from Eugène Atget onwards. For Adorno, déjà vu looked forward and backwards at once, wrapping up the antique and Aldous Huxley’s Brave New World, with its newly minted quasi-mechanical people.footnote³⁰ Universal and permanent it may be, Adorno argued, but déjà vu gives us a glimpse of the end of the bourgeoisie as they lose their ‘individual features’, in a horrific disintegration of their particularity, and become archetypes. The cause of this dissolution was the liquidation of liberalism by monopoly capitalism; déjà vu attached itself to people who had become things in their ‘copylike similarity’.footnote³¹ As Steyerl remarks in ‘Mean Images’, that process may be furthered by the statistical operations of ai as it feeds off billions of quasi-photographic portraits, chews up its data and makes its creepily average products.footnote³²

Perhaps all newly commercial forms carry with them an air of the uncanny and of déjà vu. In Zola’s novel, His Excellency Eugène Rougon (1876), the inflated extravagance of a pageant to celebrate the christening of Napoleon III’s son is contrasted with the uncanny presence of a gigantic painted advert, portraying an empty frock coat which had ‘kept the shape and stance of a body that had disappeared’.footnote³³ The advert casts its comment over the splendid but hollow regime; but it must have been a novel and disturbing presence even on more ordinary days. The odour of the uncanny, after all, was emitted strongly by photography itself. Félix Nadar wrote about the devilish air of the earliest photography: ‘the discovery of 1842’ seemed suspect, smelt ‘like a spell’, ‘reeked of heresy’ and gathered a maddening mix of ‘hydroscopy, bewitchment, conjuration, apparitions’. The darkroom, he claimed, was an ideal home for ‘the Prince of Darkness’, and ‘It would not have taken much to transform our filters into philters.’footnote³⁴ Admiration for the new medium was unsettled, suspicious and bewildered.

If, as Marx has it, production not only creates an object for the subject, but also a subject for the object, ai images cast the networked subject in a new light.footnote³⁵ Alien, deeply unknowable computational operations meet a society in which there are strong pressures toward conformity and cliché, holding up a fairground mirror with which to inspect capitalist culture. It is the image of a culture endlessly reflecting and consuming itself, governed by algorithms to boost engagement, with subjects ceaselessly urged to tailor their behaviour to satisfy the systems.

The largest ais are, of course, controlled by media monopolists with very deep pockets. What is their interest in decreasing entropy? Is it a mere side-effect, an ‘externality’ of their capture of the general intellect, another effluent pipe from the digital factory, to set alongside the enormous carbon emissions of ai processing? It may be more than that: the growth of the social-media realm is reliant on the addiction of users in a rapid and accelerating engagement of emotional arousal, of little dopamine hits; it needs continual new stimulations, a rapid discharge of affect, before one stimulus is disposed of in the move to the next. Entropy is the enemy of speedy consumption, since it may induce confusion, rejection or a pause for thought. It seems, then, that as in Flusser’s nightmare vision of eternal boredom, these monopolists are selling us a terminal sense of déjà vu. If we are to learn to operate within and against that oppressive atmosphere, it should be without fantasizing into existence a patrician class of cultural saviours.

Back to issue