Evaluating the Feasibility of Handwritten Text Recognition for Historic Maps
2025 (English)In: Presented at the Huminfra Conference, Stockholm, November 13-14, 2025, 2025Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]
Textual elements are important features of any map, yet computational identification of words and characters – namely optical character recognition (OCR) - can be challenging given the non-textual features, different textual orientations, overlayed elements, and other complicating aspects of maps. Despite these OCR has been explored for printed maps with typeface text. But little work is currently undertaken applying handwritten text recognition (HTR) on non-printed, handwritten maps. Several openly available HTR tools – such as Transkribus or HTR Flow – are able to capture text from manually written documents, but these tools are usually applied to predominantly textual documents (e.g., letters, manuscripts, diaries). There is little insight into their efficacy regarding cartographic documents.
This on-going project explores the feasibility of current artifical intelligence models for HTR on the historical maps of Danish cartographer Johannes Mejer (1606-1674). Besides learning the capacities of current technologies in this type of media, digitalization of Mejer’s collection can offer insights into a crucial period in Nordic history, preceding the Swedish acquisition of Skåne, which Mejer was the first to chart during this time. Several machine learning applications for HTR – specialized systems such as Transkribus and HTR Flow, as well as general large language models such as GPT5 and Sonnet 4 – are trained and tested.
After outlining the problem and the methods, including the preparation of AI training/testing material, this presentation reports the findings regarding the performance of currently available machine learning models. Following this, we propose subsequent steps for improved output. We also share preliminary historical insights gleaned from the processing on the Mejer’s works, as well as the overall challenge of applying HTR machine learning for difficult material such as historical maps. In so doing, the project hopes to encourage exploration of machine learning applications with unconventional material with textual elements.
Place, publisher, year, edition, pages
2025.
Keywords [en]
maps, transcription, Transkribus, generative AI, digital humanities
National Category
Cultural Studies
Research subject
Humanities, Human Geography
Identifiers
URN: urn:nbn:se:lnu:diva-142473OAI: oai:DiVA.org:lnu-142473DiVA, id: diva2:2013762
Conference
Huminfra Conference, Stockholm, Sweden, November 13-14, 2025
Part of project
Huminfra/DARIAH-SE2025-11-142025-11-142026-05-27Bibliographically approved