Misinformation – notes from Code Club

Retouching before Photoshop

If you’ve ever admired the flawless complexions in vintage portraits and assumed your great-grandparents had impossibly smooth skin – think again. Image manipulation has been around since the earliest days of photography. Angus Hamilton (Librarian, Digital Access) shared some insights, inspired by the Library’s Make Believe: Encounters with Misinformation exhibition.

Long before Photoshop, photo editing was a hands-on craft, carried out using an array of fine tools at specially designed retouching desks with slanted surfaces. It was meticulous work, often done by employees of photographic studios working on portraits in dark rooms. Retouching was most commonly done by women, who were paid significantly less than their male counterparts.

A newspaper clipping written by Perry Winkle detailing the arduous task of retouching — 'Perry Winkle', a retoucher working in a Sydney studio, gives insight into their craft. Chronicle (Adelaide, SA : 1895 –1954) Thu 29 Jun 1933. Published in 'Cheerful Letters From Everywhere – Camera Secrets the retoucher's art', p 62

While retouching was widespread, it’s rare to find surviving glass negatives that show markings left behind by the craft. However, these can be seen in two of State Library Victoria’s collections: the glass plate negatives of the Spencer Shier collection and the Rosenberg collection: studio of Vincent Kelly.

Two black and white images of the same woman, Miss J. Barber, side by side. She has short black hair and is wearing a black and white blouse — The photo on the left, marked with an ‘X’, is unedited. The photo on the right has been retouched: the subject's freckles, lines and flyaway hairs have been removed. Her hair has been darkened and her lips have been made fuller. Shier, S., 1919. 'Miss J. Barber', Accession no. H81.117/56

A black and white portrait of a woman, Mrs F. McDonald. She has a pensive look on her face and she is looking to the left of frame. Her hands are crossed in her lap. — A studio portrait of Mrs. F. McDonald. Shier, S. (1924). Mrs. F. McDonald [picture] Accession no. H81.117/53

A closeup of the chin of a woman, Mrs F. McDonald. The photo is black and white. — A closer look reveals evidence of the retoucher’s craft. Shier, S. (1924). Mrs. F. McDonald [picture] Accession no. H81.117/53

Spotting fakes

When we're bombarded by a stream of images, daily, how can we tell what's real and what isn't?

The Library’s Lead Developer (and keen birder) Nick Paustian explored the growing challenge of identifying manipulated images in the age of AI.

Nick noted that we’ve reached a tipping point: distinguishing real from fake now requires more time and skill. While our eyes can still catch telltale signs – like distorted features or overly smooth textures – more sophisticated methods rely on metadata and online detection tools.

3 separate photographs of birds, side by side. On the left is an AI-generated kingfisher, the middle is a kookaburra, and the right is a satin bowerbird — Left: AI-generated; Middle: Photo by Nick Paustian; Right: Photo by Nick Paustian

Platforms like Meta – the parent company of Facebook and Instagram – have begun adding ‘AI info’ labels to AI-generated images, video and audio; but their detectors often misfire, flagging authentic content edited with AI-powered tools like Photoshop. This is also the case for many online AI detector tools, like wasitai.com.

Screenshot of web-based application wasitai.com showing that a kookaburra photograph is AI-generated — The “truth” according to wasitai.com. Photographers like Nick may use AI-powered tools like Photoshop to, for example, crop or brighten an image, but it doesn't necessarily mean that the image was AI-manipulated.

Instead of trying to spot fakes, what if we could authenticate what's real?

Nick introduced the concept of cryptographic fingerprinting; a technique borrowed from software development. By generating a unique hash for an image, any modification – even a single pixel – would result in a different hash, revealing tampering.

A composite image of the same green parrot side by side. The one on the right is wearing a purple top hat — Left: Nick's original photo (Hash no. b69abbd89fad181d7143d3a3743df1b139d9e32bd95b827b10ac04b841c0591d). Right: A fake? (Hash no. cfaae32db222912344c5d052bf776109c108420caddb7c5cd508c7ea21be146c)

Companies like Truepic are pioneering this space, working with Adobe and others through the Coalition for Content Provenance and Authenticity (C2PA). Their goal is to embed cryptographic signatures and metadata directly into images, allowing platforms and users to verify authenticity.

This technology could be especially valuable for journalism and archival institutions, where knowing when and where an image was captured is crucial. Nick suggested that organisations like State Library Victoria might consider adopting such systems to enhance the provenance of their collections.

Optical Character Recognition (OCR)

SLV LAB’s Innovation Lead Sotirios Alpanis spoke about Optical Character Recognition (OCR) – a technology that has quietly shaped how we access and interpret historical collections.

OCR is the process of extracting text from images. It sounds simple, but as Sotirios put it, programming a computer to do this is ‘like teaching a rock how to think’. We can glance at a newspaper and instantly understand its structure – columns, headlines, captions – but for a computer, it’s chaos. Different fonts, sizes and layouts make segmentation and recognition a complex task.

Despite these challenges, OCR has transformed research. It doesn’t just make scanned documents searchable; it allows us to slice and analyse data in new ways, opening possibilities for both scholarship and creative projects. Sotirios shared examples ranging from his own playful Primary Source Ransom Note Generator to Olivia Vane’s Steptext, a tool for visualising text patterns in digitised archives.

Primary Source Random Note Generator screenshot — Primary Source Ransom Note Generator by Sotirios Alpanis (https://cogapp-psrng.netlify.app/)

'nurse' appearing repeatedly on a web-based browser tool called Steptext — Steptext by Olivia Vane (https://www.oliviavane.co.uk/step-text)

Sotirios reflected on his early career at the British Newspaper Archive, where OCR dramatically accelerated digitisation. What once took years could now be done in days. But this progress came with hidden costs: segmentation was still manual, and OCR training was often outsourced to low-paid workers in Southeast Asia – a practice that echoes the way modern AI models are trained today. As Sotirios noted, 'OCR has a slightly murky past.’

The story of OCR is also tied to the big tech race of the 2000s. Google and Microsoft competed to digitise vast libraries, including the British Library’s collections. Microsoft eventually abandoned its project, returning images with minimal metadata and documentation. Google persisted, digitising millions of pages – but kept its methods opaque and focused on English-language material, leaving gaps for non-Latin scripts.

One bright spot was Tesseract, an OCR engine open-sourced by HP in 2005 and later developed by Google. Today, it underpins many OCR tools, including those used by cultural institutions.

Printed text was only the beginning. Handwritten materials posed an even greater challenge, leading to the development of Transkribus, a platform born from EU-funded projects that allows users to upload documents, correct transcriptions and train custom AI models – a powerful tool for cultural heritage institutions. Sotirios contributed to its early stages by helping to prepare datasets of Arabic scientific manuscripts.

An Arabic manuscript being transcribed by Transkribus

Despite the hype around AI, OCR remains a reliable workhorse – one Sotirios has often returned to throughout his career. However, OCR’s evolution raises important concerns, such as the creation of bias due to uneven datasets where English material dominates, leaving other languages behind. Additionally, human labour is often invisible, and OCR data now feeds large language models (LLMs), adding another layer of bias to digital systems.

_________________

Code Club is a grassroots initiative within State Library Victoria that provides space for staff to learn and engage with technology. The club aims to increase digital literacy, demystify technology and foster cross-departmental connections. Learn more