TY - JOUR
T1 - Accuracy and Precision of Mandible Segmentation and Its Clinical Implications
T2 - Virtual Reality, Desktop Screen and Artificial Intelligence
AU - Gruber, Lennart Johannes
AU - Egger, Jan
AU - Bönsch, Andrea
AU - Kraeima, Joep
AU - Ulbrich, Max
AU - van den Bosch, Vincent
AU - Motmaen, Ila
AU - Wilpert, Caroline
AU - Ooms, Mark
AU - Isfort, Peter
AU - Hölzle, Frank
AU - Puladi, Behrus
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/4/1
Y1 - 2024/4/1
N2 - Objective: 3D modeling is a major challenge in computer-assisted surgery (CAS). Manual segmentation, as the gold standard, is tedious, time consuming, and particularly challenging for the mandible, while artificial intelligence (AI)-based segmentation is a promising and time-saving alternative. However, little is known about the clinical implications of various segmentation methods. Method: In this cross-over study, ten mandibles were segmented in virtual reality (VR), on a desktop screen (DS) by five experts and via five AI models. The exported mandible models were evaluated using metrics, a public reference (PUBDS), and blinded assessments by two radiologists. Results: Average segmentation-to-volume accuracy (1 = poor, 5 = perfect) was comparable for human segmentation (VR: 4.56; DS: 4.33; PUBDS: 4.55) and significant better than AI-based segmentation (AI: 3.80), while the average segmentation-to-segmentation accuracy revealed that DS (91.4 %/0.37 mm [Dice coefficient/average Hausdorff distance]) was more comparable to PUBDS than to VR (90.1 %/0.44 mm). The precision of VR (96.8 %/0.14 mm) and DS (96.6 %/0.15 mm) was superior to PUBDS (94.1 %/0.21 mm) and the AI method (89.2 %/0.60 mm). While VR was significantly faster than DS and PUBDS for the manual segmentation methods (p = 0.007/< 0.001), in contrast, the AI method is not time sensitive due to its possible hardware scalability. Conclusion: Accuracy and precision of mandible segmentation depends primarily on CT quality and anatomical site, which should be considered in clinical applications and the generation of AI training data and could negatively impact CAS. Although current AI models have perfect intra-model reliability, they demonstrate higher inter-model variability and are accompanied by invalid outliers making human review still necessary. In summary, the use of VR in manual segmentation showed high accuracy and precision overall while saving time, making it the preferred method over DS due to its good usability.
AB - Objective: 3D modeling is a major challenge in computer-assisted surgery (CAS). Manual segmentation, as the gold standard, is tedious, time consuming, and particularly challenging for the mandible, while artificial intelligence (AI)-based segmentation is a promising and time-saving alternative. However, little is known about the clinical implications of various segmentation methods. Method: In this cross-over study, ten mandibles were segmented in virtual reality (VR), on a desktop screen (DS) by five experts and via five AI models. The exported mandible models were evaluated using metrics, a public reference (PUBDS), and blinded assessments by two radiologists. Results: Average segmentation-to-volume accuracy (1 = poor, 5 = perfect) was comparable for human segmentation (VR: 4.56; DS: 4.33; PUBDS: 4.55) and significant better than AI-based segmentation (AI: 3.80), while the average segmentation-to-segmentation accuracy revealed that DS (91.4 %/0.37 mm [Dice coefficient/average Hausdorff distance]) was more comparable to PUBDS than to VR (90.1 %/0.44 mm). The precision of VR (96.8 %/0.14 mm) and DS (96.6 %/0.15 mm) was superior to PUBDS (94.1 %/0.21 mm) and the AI method (89.2 %/0.60 mm). While VR was significantly faster than DS and PUBDS for the manual segmentation methods (p = 0.007/< 0.001), in contrast, the AI method is not time sensitive due to its possible hardware scalability. Conclusion: Accuracy and precision of mandible segmentation depends primarily on CT quality and anatomical site, which should be considered in clinical applications and the generation of AI training data and could negatively impact CAS. Although current AI models have perfect intra-model reliability, they demonstrate higher inter-model variability and are accompanied by invalid outliers making human review still necessary. In summary, the use of VR in manual segmentation showed high accuracy and precision overall while saving time, making it the preferred method over DS due to its good usability.
KW - Artificial intelligence
KW - Computer-assisted surgery
KW - Oral and maxillofacial surgery
KW - Segmentation
KW - Virtual reality
UR - http://www.scopus.com/inward/record.url?scp=85176304182&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.122275
DO - 10.1016/j.eswa.2023.122275
M3 - Article
AN - SCOPUS:85176304182
SN - 0957-4174
VL - 239
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 122275
ER -