Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Albert T. Young, Kristen Fernandez, Jacob Pfau, Rasika Reddy, Nhat Anh Cao, Max Y. von Franque, Arjun Johal, Benjamin V. Wu, Rachel R. Wu, Jennifer Y. Chen, Raj P. Fadadu, Juan A. Vasquez, Andrew Tam, Michael J. Keiser, Maria L. Wei
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
Acceso en línea:https://doaj.org/article/89fc976ac492433b853698240fbfcfbb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:89fc976ac492433b853698240fbfcfbb
record_format dspace
spelling oai:doaj.org-article:89fc976ac492433b853698240fbfcfbb2021-12-02T16:01:14ZStress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models10.1038/s41746-020-00380-62398-6352https://doaj.org/article/89fc976ac492433b853698240fbfcfbb2021-01-01T00:00:00Zhttps://doi.org/10.1038/s41746-020-00380-6https://doaj.org/toc/2398-6352Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.Albert T. YoungKristen FernandezJacob PfauRasika ReddyNhat Anh CaoMax Y. von FranqueArjun JohalBenjamin V. WuRachel R. WuJennifer Y. ChenRaj P. FadaduJuan A. VasquezAndrew TamMichael J. KeiserMaria L. WeiNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 4, Iss 1, Pp 1-8 (2021)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Albert T. Young
Kristen Fernandez
Jacob Pfau
Rasika Reddy
Nhat Anh Cao
Max Y. von Franque
Arjun Johal
Benjamin V. Wu
Rachel R. Wu
Jennifer Y. Chen
Raj P. Fadadu
Juan A. Vasquez
Andrew Tam
Michael J. Keiser
Maria L. Wei
Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
description Abstract Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.
format article
author Albert T. Young
Kristen Fernandez
Jacob Pfau
Rasika Reddy
Nhat Anh Cao
Max Y. von Franque
Arjun Johal
Benjamin V. Wu
Rachel R. Wu
Jennifer Y. Chen
Raj P. Fadadu
Juan A. Vasquez
Andrew Tam
Michael J. Keiser
Maria L. Wei
author_facet Albert T. Young
Kristen Fernandez
Jacob Pfau
Rasika Reddy
Nhat Anh Cao
Max Y. von Franque
Arjun Johal
Benjamin V. Wu
Rachel R. Wu
Jennifer Y. Chen
Raj P. Fadadu
Juan A. Vasquez
Andrew Tam
Michael J. Keiser
Maria L. Wei
author_sort Albert T. Young
title Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_short Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_full Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_fullStr Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_full_unstemmed Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
title_sort stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/89fc976ac492433b853698240fbfcfbb
work_keys_str_mv AT alberttyoung stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT kristenfernandez stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT jacobpfau stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT rasikareddy stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT nhatanhcao stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT maxyvonfranque stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT arjunjohal stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT benjaminvwu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT rachelrwu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT jenniferychen stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT rajpfadadu stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT juanavasquez stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT andrewtam stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT michaeljkeiser stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
AT marialwei stresstestingrevealsgapsinclinicreadinessofimagebaseddiagnosticartificialintelligencemodels
_version_ 1718385285278990336