Automatic subject indexing of oral history interviews with Whisper and Claude
2025 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]
In the archival media trinity of text, image and sound, the latter presents particular challenges for users’ searching and browsing. While a user can manually or digitally browse through texts and images to locate items of interest, sound needs to be played more or less in real time to do the same. Audio files can naturally be described and transcribed – digital or digitized audio files of speech automatically so – thus facilitating full-text search. However, the familiar Achilles’ heel of full-text search, namely the ambiguity of natural language, remains.
Enter the hallmark trade of librarianship: subject indexing. When subject index terms have been assigned to individual sections of an audio file, a user searching for these or similar terms can locate precisely where in the retrieved audio files the subject is discussed. With state-of-the-art AI systems, even this can be done automatically, thereby decimating the amount of time needed to index audio files, from real time to as fast as the system can process them. This heralds a brave new future for the accessibility and searchability of oral history archives.
This poster presents a pilot study on automatically transcribing interviews in Swedish from oral history archives using OpenAI’s Whisper, describing the content and assigning subject index terms to sections using Claude from Anthropic, and visualizing the results. The accuracy of the results depends on many factors, including sound quality, accents of the speakers, the amount of language mixing etc. The results are very promising, however, suggesting automatic subject indexing of interviews to be a worthwhile research direction going forward.
Place, publisher, year, edition, pages
2025.
Keywords [en]
artificial intelligence, archives, oral history, indexing
National Category
Information Studies
Identifiers
URN: urn:nbn:se:lnu:diva-138419OAI: oai:DiVA.org:lnu-138419DiVA, id: diva2:1956997
Conference
Digital Dreams and Practices, Digital Humanities in Nordic and Baltic Countries 9th Conference, Tartu, Estonia 5-7,03,2025
Part of project
Artificial Intelligence as a risk and opportunity for the authenticity of archives, Wallenberg Foundations
Funder
Wallenberg AI, Autonomous Systems and Software Program – Humanity and Society (WASP-HS)2025-05-082025-05-082025-05-08Bibliographically approved