Overcoming the limitations of patch-based learning to detect cancer in whole slide images

Abstract Whole slide images (WSIs) pose unique challenges when training deep learning models. They are very large which makes it necessary to break each image down into smaller patches for analysis, image features have to be extracted at multiple scales in order to capture both detail and context, a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ozan Ciga, Tony Xu, Sharon Nofech-Mozes, Shawna Noy, Fang-I Lu, Anne L. Martel
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/8f311b2bd8d8422bbe478508c0c6ee33
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8f311b2bd8d8422bbe478508c0c6ee33
record_format dspace
spelling oai:doaj.org-article:8f311b2bd8d8422bbe478508c0c6ee332021-12-02T16:56:02ZOvercoming the limitations of patch-based learning to detect cancer in whole slide images10.1038/s41598-021-88494-z2045-2322https://doaj.org/article/8f311b2bd8d8422bbe478508c0c6ee332021-04-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-88494-zhttps://doaj.org/toc/2045-2322Abstract Whole slide images (WSIs) pose unique challenges when training deep learning models. They are very large which makes it necessary to break each image down into smaller patches for analysis, image features have to be extracted at multiple scales in order to capture both detail and context, and extreme class imbalances may exist. Significant progress has been made in the analysis of these images, thanks largely due to the availability of public annotated datasets. We postulate, however, that even if a method scores well on a challenge task, this success may not translate to good performance in a more clinically relevant workflow. Many datasets consist of image patches which may suffer from data curation bias; other datasets are only labelled at the whole slide level and the lack of annotations across an image may mask erroneous local predictions so long as the final decision is correct. In this paper, we outline the differences between patch or slide-level classification versus methods that need to localize or segment cancer accurately across the whole slide, and we experimentally verify that best practices differ in both cases. We apply a binary cancer detection network on post neoadjuvant therapy breast cancer WSIs to find the tumor bed outlining the extent of cancer, a task which requires sensitivity and precision across the whole slide. We extensively study multiple design choices and their effects on the outcome, including architectures and augmentations. We propose a negative data sampling strategy, which drastically reduces the false positive rate (25% of false positives versus 62.5%) and improves each metric pertinent to our problem, with a 53% reduction in the error of tumor extent. Our results indicate classification performances of image patches versus WSIs are inversely related when the same negative data sampling strategy is used. Specifically, injection of negatives into training data for image patch classification degrades the performance, whereas the performance is improved for slide and pixel-level WSI classification tasks. Furthermore, we find applying extensive augmentations helps more in WSI-based tasks compared to patch-level image classification.Ozan CigaTony XuSharon Nofech-MozesShawna NoyFang-I LuAnne L. MartelNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ozan Ciga
Tony Xu
Sharon Nofech-Mozes
Shawna Noy
Fang-I Lu
Anne L. Martel
Overcoming the limitations of patch-based learning to detect cancer in whole slide images
description Abstract Whole slide images (WSIs) pose unique challenges when training deep learning models. They are very large which makes it necessary to break each image down into smaller patches for analysis, image features have to be extracted at multiple scales in order to capture both detail and context, and extreme class imbalances may exist. Significant progress has been made in the analysis of these images, thanks largely due to the availability of public annotated datasets. We postulate, however, that even if a method scores well on a challenge task, this success may not translate to good performance in a more clinically relevant workflow. Many datasets consist of image patches which may suffer from data curation bias; other datasets are only labelled at the whole slide level and the lack of annotations across an image may mask erroneous local predictions so long as the final decision is correct. In this paper, we outline the differences between patch or slide-level classification versus methods that need to localize or segment cancer accurately across the whole slide, and we experimentally verify that best practices differ in both cases. We apply a binary cancer detection network on post neoadjuvant therapy breast cancer WSIs to find the tumor bed outlining the extent of cancer, a task which requires sensitivity and precision across the whole slide. We extensively study multiple design choices and their effects on the outcome, including architectures and augmentations. We propose a negative data sampling strategy, which drastically reduces the false positive rate (25% of false positives versus 62.5%) and improves each metric pertinent to our problem, with a 53% reduction in the error of tumor extent. Our results indicate classification performances of image patches versus WSIs are inversely related when the same negative data sampling strategy is used. Specifically, injection of negatives into training data for image patch classification degrades the performance, whereas the performance is improved for slide and pixel-level WSI classification tasks. Furthermore, we find applying extensive augmentations helps more in WSI-based tasks compared to patch-level image classification.
format article
author Ozan Ciga
Tony Xu
Sharon Nofech-Mozes
Shawna Noy
Fang-I Lu
Anne L. Martel
author_facet Ozan Ciga
Tony Xu
Sharon Nofech-Mozes
Shawna Noy
Fang-I Lu
Anne L. Martel
author_sort Ozan Ciga
title Overcoming the limitations of patch-based learning to detect cancer in whole slide images
title_short Overcoming the limitations of patch-based learning to detect cancer in whole slide images
title_full Overcoming the limitations of patch-based learning to detect cancer in whole slide images
title_fullStr Overcoming the limitations of patch-based learning to detect cancer in whole slide images
title_full_unstemmed Overcoming the limitations of patch-based learning to detect cancer in whole slide images
title_sort overcoming the limitations of patch-based learning to detect cancer in whole slide images
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/8f311b2bd8d8422bbe478508c0c6ee33
work_keys_str_mv AT ozanciga overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
AT tonyxu overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
AT sharonnofechmozes overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
AT shawnanoy overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
AT fangilu overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
AT annelmartel overcomingthelimitationsofpatchbasedlearningtodetectcancerinwholeslideimages
_version_ 1718382811408236544