Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate l...
Guardado en:
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/6d37d85908834092835964ec5db718bb |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:6d37d85908834092835964ec5db718bb |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:6d37d85908834092835964ec5db718bb2021-11-04T06:04:31ZDevelopment of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre2296-858X10.3389/fmed.2021.748168https://doaj.org/article/6d37d85908834092835964ec5db718bb2021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fmed.2021.748168/fullhttps://doaj.org/toc/2296-858XImportance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.Benjamin HunterBenjamin HunterSara ReisDes CampbellSheila MatharuPrashanthi RatnakumarLuca MercuriSumeet HindochaSumeet HindochaHardeep KalsiHardeep KalsiErik MayerErik MayerBen GlampsonEmily J. RobinsonBisan Al-LazikaniLisa ScerriSusannah BlochRichard LeeRichard LeeRichard LeeFrontiers Media S.A.articlelung noduleinformaticsstructured query language (SQL)natural language processing (NLP)machine learningMedicine (General)R5-920ENFrontiers in Medicine, Vol 8 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
lung nodule informatics structured query language (SQL) natural language processing (NLP) machine learning Medicine (General) R5-920 |
spellingShingle |
lung nodule informatics structured query language (SQL) natural language processing (NLP) machine learning Medicine (General) R5-920 Benjamin Hunter Benjamin Hunter Sara Reis Des Campbell Sheila Matharu Prashanthi Ratnakumar Luca Mercuri Sumeet Hindocha Sumeet Hindocha Hardeep Kalsi Hardeep Kalsi Erik Mayer Erik Mayer Ben Glampson Emily J. Robinson Bisan Al-Lazikani Lisa Scerri Susannah Bloch Richard Lee Richard Lee Richard Lee Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
description |
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition. |
format |
article |
author |
Benjamin Hunter Benjamin Hunter Sara Reis Des Campbell Sheila Matharu Prashanthi Ratnakumar Luca Mercuri Sumeet Hindocha Sumeet Hindocha Hardeep Kalsi Hardeep Kalsi Erik Mayer Erik Mayer Ben Glampson Emily J. Robinson Bisan Al-Lazikani Lisa Scerri Susannah Bloch Richard Lee Richard Lee Richard Lee |
author_facet |
Benjamin Hunter Benjamin Hunter Sara Reis Des Campbell Sheila Matharu Prashanthi Ratnakumar Luca Mercuri Sumeet Hindocha Sumeet Hindocha Hardeep Kalsi Hardeep Kalsi Erik Mayer Erik Mayer Ben Glampson Emily J. Robinson Bisan Al-Lazikani Lisa Scerri Susannah Bloch Richard Lee Richard Lee Richard Lee |
author_sort |
Benjamin Hunter |
title |
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
title_short |
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
title_full |
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
title_fullStr |
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
title_full_unstemmed |
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre |
title_sort |
development of a structured query language and natural language processing algorithm to identify lung nodules in a cancer centre |
publisher |
Frontiers Media S.A. |
publishDate |
2021 |
url |
https://doaj.org/article/6d37d85908834092835964ec5db718bb |
work_keys_str_mv |
AT benjaminhunter developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT benjaminhunter developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT sarareis developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT descampbell developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT sheilamatharu developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT prashanthiratnakumar developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT lucamercuri developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT sumeethindocha developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT sumeethindocha developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT hardeepkalsi developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT hardeepkalsi developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT erikmayer developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT erikmayer developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT benglampson developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT emilyjrobinson developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT bisanallazikani developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT lisascerri developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT susannahbloch developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre AT richardlee developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre |
_version_ |
1718445153960591360 |