Exploring the Capabilities of GPT4-Vision as OCR Engine

Alex Ghiriti, Wolfgang Göderle, Roman Kern

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Many museums and libraries conducted efforts to digitize their assets, and many historic documents are now available as digital images. However, these documents are not directly accessible to retrieval systems that rely on written text and not images. In this study, the novel GPT4-Vision is being studied for its ability of optical character recognition (OCR), in cases where established methods, such as Tesseract may have difficulties. We find that GPT4-Vision provides excellent results even in cases where even humans struggle. We also identified a number of key limitations, including the long runtime implying high energy requirements, the lack of handling of rotated images, the necessity for layout hints, and limitations regarding image size. Even with these limitations, it is expected that large language models and vision transformers will play an important role to make historical documents more accessible for further processing, or directly to users.
Originalspracheenglisch
TitelLinking Theory and Practice of Digital Libraries - 28th International Conference on Theory and Practice of Digital Libraries, TPDL 2024, Proceedings
Redakteure/-innenApostolos Antonacopoulos, Annika Hinze, Nicholas Vanderschantz, Benjamin Piwowarski, Mickaël Coustaty, Giorgio Maria Di Nunzio, Francesco Gelati
Seiten3-12
Seitenumfang10
ISBN (elektronisch)978-3-031-72440-4
DOIs
PublikationsstatusVeröffentlicht - 25 Sept. 2024
Veranstaltung28th International Conference on Theory and Practice of Digital Libraries, TPDL 2024 - Ljubljana, Slowenien
Dauer: 24 Sept. 202427 Sept. 2024

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band15178 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz28th International Conference on Theory and Practice of Digital Libraries, TPDL 2024
Land/GebietSlowenien
OrtLjubljana
Zeitraum24/09/2427/09/24

ASJC Scopus subject areas

  • Theoretische Informatik
  • Allgemeine Computerwissenschaft

Fingerprint

Untersuchen Sie die Forschungsthemen von „Exploring the Capabilities of GPT4-Vision as OCR Engine“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren