Information extraction from German radiological reports for general clinical text and language understanding

Michael Jantscher, Felix Gunzer, Roman Kern, Eva Hassler, Sebastian Tschauner, Gernot Reishofer

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.
Original languageEnglish
Article number2353
JournalScientific Reports
Volume13
Issue number1
Early online date9 Feb 2023
DOIs
Publication statusPublished - Dec 2023

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Information extraction from German radiological reports for general clinical text and language understanding'. Together they form a unique fingerprint.

Cite this