Large Scale Slide Digitalisation for Machine Learning in Computational Pathology

Markus Plass

Research output: ThesisMaster's Thesis


Computational pathology is a thriving research domain that uses large volumes of imaging data accompanied by sensitive clinical data collected to heterogenous data models to fundamentally improve how the histopathological and clinical diagnosis and oncological treatment of the patients is performed. The recent developments of high-throughput slide scanners offer a possibility for making the contained information of the glass slides stored in biobanks available for machine learning algorithms. Ensuring storage and access to digital slides, also called whole slide images (WSI), will overcome the current limitations to accessing and sharing pathology material together with the associated metadata.
This work describes the design and implementation of the digitization workflow, consisting of the following steps: (i) Selection and description of a scanning cohort by it's metadata, (ii) retrieving of the physical slides from the archive, (iii) cleaning and pre-processing, (iv) scanning with different scanners, (v) quality control, (vi) generation of technical scan metadata, (vii) linkage to phenotypical descriptions and (viii) cataloguing and long term storage.
The results were published in two papers and implemented as scanning infrastructure (software and hardware) consisting of a central database and several web interfaces to model the business logic of the scanning workflow. The solution is currently in production, and was already used for scanning and managing of more than 300.000 slides.
Original languageEnglish
QualificationMaster of Science
Awarding Institution
  • Graz University of Technology (90000)
  • Stollberger, Rudolf, Supervisor
  • Holzinger, Andreas, Supervisor
Publication statusPublished - 2020


  • Digital Pathology
  • Whole Slide Image
  • Digitisation
  • Machine Learing
  • Biobanking

Cite this