Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate l...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Benjamin Hunter, Sara Reis, Des Campbell, Sheila Matharu, Prashanthi Ratnakumar, Luca Mercuri, Sumeet Hindocha, Hardeep Kalsi, Erik Mayer, Ben Glampson, Emily J. Robinson, Bisan Al-Lazikani, Lisa Scerri, Susannah Bloch, Richard Lee
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/6d37d85908834092835964ec5db718bb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:6d37d85908834092835964ec5db718bb
record_format dspace
spelling oai:doaj.org-article:6d37d85908834092835964ec5db718bb2021-11-04T06:04:31ZDevelopment of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre2296-858X10.3389/fmed.2021.748168https://doaj.org/article/6d37d85908834092835964ec5db718bb2021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fmed.2021.748168/fullhttps://doaj.org/toc/2296-858XImportance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.Benjamin HunterBenjamin HunterSara ReisDes CampbellSheila MatharuPrashanthi RatnakumarLuca MercuriSumeet HindochaSumeet HindochaHardeep KalsiHardeep KalsiErik MayerErik MayerBen GlampsonEmily J. RobinsonBisan Al-LazikaniLisa ScerriSusannah BlochRichard LeeRichard LeeRichard LeeFrontiers Media S.A.articlelung noduleinformaticsstructured query language (SQL)natural language processing (NLP)machine learningMedicine (General)R5-920ENFrontiers in Medicine, Vol 8 (2021)
institution DOAJ
collection DOAJ
language EN
topic lung nodule
informatics
structured query language (SQL)
natural language processing (NLP)
machine learning
Medicine (General)
R5-920
spellingShingle lung nodule
informatics
structured query language (SQL)
natural language processing (NLP)
machine learning
Medicine (General)
R5-920
Benjamin Hunter
Benjamin Hunter
Sara Reis
Des Campbell
Sheila Matharu
Prashanthi Ratnakumar
Luca Mercuri
Sumeet Hindocha
Sumeet Hindocha
Hardeep Kalsi
Hardeep Kalsi
Erik Mayer
Erik Mayer
Ben Glampson
Emily J. Robinson
Bisan Al-Lazikani
Lisa Scerri
Susannah Bloch
Richard Lee
Richard Lee
Richard Lee
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
description Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
format article
author Benjamin Hunter
Benjamin Hunter
Sara Reis
Des Campbell
Sheila Matharu
Prashanthi Ratnakumar
Luca Mercuri
Sumeet Hindocha
Sumeet Hindocha
Hardeep Kalsi
Hardeep Kalsi
Erik Mayer
Erik Mayer
Ben Glampson
Emily J. Robinson
Bisan Al-Lazikani
Lisa Scerri
Susannah Bloch
Richard Lee
Richard Lee
Richard Lee
author_facet Benjamin Hunter
Benjamin Hunter
Sara Reis
Des Campbell
Sheila Matharu
Prashanthi Ratnakumar
Luca Mercuri
Sumeet Hindocha
Sumeet Hindocha
Hardeep Kalsi
Hardeep Kalsi
Erik Mayer
Erik Mayer
Ben Glampson
Emily J. Robinson
Bisan Al-Lazikani
Lisa Scerri
Susannah Bloch
Richard Lee
Richard Lee
Richard Lee
author_sort Benjamin Hunter
title Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_short Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_full Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_fullStr Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_full_unstemmed Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_sort development of a structured query language and natural language processing algorithm to identify lung nodules in a cancer centre
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/6d37d85908834092835964ec5db718bb
work_keys_str_mv AT benjaminhunter developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT benjaminhunter developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT sarareis developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT descampbell developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT sheilamatharu developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT prashanthiratnakumar developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT lucamercuri developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT sumeethindocha developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT sumeethindocha developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT hardeepkalsi developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT hardeepkalsi developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT erikmayer developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT erikmayer developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT benglampson developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT emilyjrobinson developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT bisanallazikani developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT lisascerri developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT susannahbloch developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
_version_ 1718445153960591360