TY - JOUR
T1 - Health data space nodes for privacy-preserving linkage of medical data to support collaborative secondary analyses
AU - Baumgartner, Martin
AU - Kreiner, Karl
AU - Lauschensky, Aaron
AU - Jammerbund, Bernhard
AU - Donsa, Klaus
AU - Hayn, Dieter
AU - Wiesmüller, Fabian
AU - Demelius, Lea
AU - Modre-Osprian, Robert
AU - Neururer, Sabrina
AU - Slamanig, Gerald
AU - Prantl, Sarah
AU - Brunelli, Luca
AU - Pfeifer, Bernhard
AU - Pölzl, Gerhard
AU - Schreier, Günter
N1 - Publisher Copyright:
Copyright © 2024 Baumgartner, Kreiner, Lauschensky, Jammerbund, Donsa, Hayn, Wiesmüller, Demelius, Modre-Osprian, Neururer, Slamanig, Prantl, Brunelli, Pfeifer, Pölzl and Schreier.
PY - 2024
Y1 - 2024
N2 - Introduction: The potential for secondary use of health data to improve healthcare is currently not fully exploited. Health data is largely kept in isolated data silos and key infrastructure to aggregate these silos into standardized bodies of knowledge is underdeveloped. We describe the development, implementation, and evaluation of a federated infrastructure to facilitate versatile secondary use of health data based on Health Data Space nodes. Materials and methods: Our proposed nodes are self-contained units that digest data through an extract-transform-load framework that pseudonymizes and links data with privacy-preserving record linkage and harmonizes into a common data model (OMOP CDM). To support collaborative analyses a multi-level feature store is also implemented. A feasibility experiment was conducted to test the infrastructures potential for machine learning operations and deployment of other apps (e.g., visualization). Nodes can be operated in a network at different levels of sharing according to the level of trust within the network. Results: In a proof-of-concept study, a privacy-preserving registry for heart failure patients has been implemented as a real-world showcase for Health Data Space nodes at the highest trust level, linking multiple data sources including (a) electronical medical records from hospitals, (b) patient data from a telemonitoring system, and (c) data from Austria’s national register of deaths. The registry is deployed at the tirol kliniken, a hospital carrier in the Austrian state of Tyrol, and currently includes 5,004 patients, with over 2.9 million measurements, over 574,000 observations, more than 63,000 clinical free text notes, and in total over 5.2 million data points. Data curation and harmonization processes are executed semi-automatically at each individual node according to data sharing policies to ensure data sovereignty, scalability, and privacy. As a feasibility test, a natural language processing model for classification of clinical notes was deployed and tested. Discussion: The presented Health Data Space node infrastructure has proven to be practicable in a real-world implementation in a live and productive registry for heart failure. The present work was inspired by the European Health Data Space initiative and its spirit to interconnect health data silos for versatile secondary use of health data.
AB - Introduction: The potential for secondary use of health data to improve healthcare is currently not fully exploited. Health data is largely kept in isolated data silos and key infrastructure to aggregate these silos into standardized bodies of knowledge is underdeveloped. We describe the development, implementation, and evaluation of a federated infrastructure to facilitate versatile secondary use of health data based on Health Data Space nodes. Materials and methods: Our proposed nodes are self-contained units that digest data through an extract-transform-load framework that pseudonymizes and links data with privacy-preserving record linkage and harmonizes into a common data model (OMOP CDM). To support collaborative analyses a multi-level feature store is also implemented. A feasibility experiment was conducted to test the infrastructures potential for machine learning operations and deployment of other apps (e.g., visualization). Nodes can be operated in a network at different levels of sharing according to the level of trust within the network. Results: In a proof-of-concept study, a privacy-preserving registry for heart failure patients has been implemented as a real-world showcase for Health Data Space nodes at the highest trust level, linking multiple data sources including (a) electronical medical records from hospitals, (b) patient data from a telemonitoring system, and (c) data from Austria’s national register of deaths. The registry is deployed at the tirol kliniken, a hospital carrier in the Austrian state of Tyrol, and currently includes 5,004 patients, with over 2.9 million measurements, over 574,000 observations, more than 63,000 clinical free text notes, and in total over 5.2 million data points. Data curation and harmonization processes are executed semi-automatically at each individual node according to data sharing policies to ensure data sovereignty, scalability, and privacy. As a feasibility test, a natural language processing model for classification of clinical notes was deployed and tested. Discussion: The presented Health Data Space node infrastructure has proven to be practicable in a real-world implementation in a live and productive registry for heart failure. The present work was inspired by the European Health Data Space initiative and its spirit to interconnect health data silos for versatile secondary use of health data.
KW - advanced analytics
KW - artificial intelligence
KW - data-driven healthcare
KW - European Health Data Space
KW - interoperability
KW - machine learning
KW - privacy-preservation
KW - record linkage
UR - http://www.scopus.com/inward/record.url?scp=85191083597&partnerID=8YFLogxK
U2 - 10.3389/fmed.2024.1301660
DO - 10.3389/fmed.2024.1301660
M3 - Article
AN - SCOPUS:85191083597
SN - 2296-858X
VL - 11
JO - Frontiers in Medicine
JF - Frontiers in Medicine
M1 - 1301660
ER -