Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or...
Guardado en:
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/89fc976ac492433b853698240fbfcfbb |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:89fc976ac492433b853698240fbfcfbb |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:89fc976ac492433b853698240fbfcfbb2021-12-02T16:01:14ZStress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models10.1038/s41746-020-00380-62398-6352https://doaj.org/article/89fc976ac492433b853698240fbfcfbb2021-01-01T00:00:00Zhttps://doi.org/10.1038/s41746-020-00380-6https://doaj.org/toc/2398-6352Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.Albert T. YoungKristen FernandezJacob PfauRasika ReddyNhat Anh CaoMax Y. von FranqueArjun JohalBenjamin V. WuRachel R. WuJennifer Y. ChenRaj P. FadaduJuan A. VasquezAndrew TamMichael J. KeiserMaria L. WeiNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 4, Iss 1, Pp 1-8 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Computer applications to medicine. Medical informatics R858-859.7 |
spellingShingle |
Computer applications to medicine. Medical informatics R858-859.7 Albert T. Young Kristen Fernandez Jacob Pfau Rasika Reddy Nhat Anh Cao Max Y. von Franque Arjun Johal Benjamin V. Wu Rachel R. Wu Jennifer Y. Chen Raj P. Fadadu Juan A. Vasquez Andrew Tam Michael J. Keiser Maria L. Wei Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
description |
Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness. |
format |
article |
author |
Albert T. Young Kristen Fernandez Jacob Pfau Rasika Reddy Nhat Anh Cao Max Y. von Franque Arjun Johal Benjamin V. Wu Rachel R. Wu Jennifer Y. Chen Raj P. Fadadu Juan A. Vasquez Andrew Tam Michael J. Keiser Maria L. Wei |
author_facet |
Albert T. Young Kristen Fernandez Jacob Pfau Rasika Reddy Nhat Anh Cao Max Y. von Franque Arjun Johal Benjamin V. Wu Rachel R. Wu Jennifer Y. Chen Raj P. Fadadu Juan A. Vasquez Andrew Tam Michael J. Keiser Maria L. Wei |
author_sort |
Albert T. Young |
title |
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_short |
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_full |
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_fullStr |
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_full_unstemmed |
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
title_sort |
stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models |
publisher |
Nature Portfolio |
publishDate |
2021 |
url |
https://doaj.org/article/89fc976ac492433b853698240fbfcfbb |
work_keys_str_mv |
AT alberttyoung stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT kristenfernandez stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT jacobpfau stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT rasikareddy stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT nhatanhcao stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT maxyvonfranque stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT arjunjohal stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT benjaminvwu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT rachelrwu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT jenniferychen stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT rajpfadadu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT juanavasquez stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT andrewtam stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT michaeljkeiser stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels AT marialwei stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels |
_version_ |
1718385285278990336 |