A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services

A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover,...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Okechinyere J. Achilonu, Victor Olago, Elvira Singh, René M. J. C. Eijkemans, Gideon Nimako, Eustasius Musenge
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/611d2a80aac04663b70ecec288748ed4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:611d2a80aac04663b70ecec288748ed4
record_format dspace
spelling oai:doaj.org-article:611d2a80aac04663b70ecec288748ed42021-11-25T17:58:29ZA Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services10.3390/info121104512078-2489https://doaj.org/article/611d2a80aac04663b70ecec288748ed42021-10-01T00:00:00Zhttps://www.mdpi.com/2078-2489/12/11/451https://doaj.org/toc/2078-2489A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, the increasing incidence of cancer has generated a high throughput of pathology reports. Hence, manual extraction and classification of information from these reports can be intrinsically complex and resource-intensive. This study aimed to (i) evaluate the quality of over 80,000 breast, colorectal, and prostate cancer free-text pathology reports and (ii) assess the effectiveness of random forest (RF) and variants of support vector machine (SVM) in the classification of reports into benign and malignant classes. The study approach comprises data preprocessing, visualisation, feature selections, text classification, and evaluation of performance metrics. The performance of the classifiers was evaluated across various feature sizes, which were jointly selected by four filter feature selection methods. The feature selection methods identified established clinical terms, which are synonymous with each of the three cancers. Uni-gram tokenisation using the classifiers showed that the predictive power of RF model was consistent across various feature sizes, with overall F-scores of 95.2%, 94.0%, and 95.3% for breast, colorectal, and prostate cancer classification, respectively. The radial SVM achieved better classification performance compared with its linear variant for most of the feature sizes. The classifiers also achieved high precision, recall, and accuracy. This study supports a nationally agreed standard in pathology reporting and the use of text mining for encoding, classifying, and production of high-quality information abstractions for cancer prognosis and research.Okechinyere J. AchilonuVictor OlagoElvira SinghRené M. J. C. EijkemansGideon NimakoEustasius MusengeMDPI AGarticlepathology reportsbreastcolorectalprostatetext miningmachine learningInformation technologyT58.5-58.64ENInformation, Vol 12, Iss 451, p 451 (2021)
institution DOAJ
collection DOAJ
language EN
topic pathology reports
breast
colorectal
prostate
text mining
machine learning
Information technology
T58.5-58.64
spellingShingle pathology reports
breast
colorectal
prostate
text mining
machine learning
Information technology
T58.5-58.64
Okechinyere J. Achilonu
Victor Olago
Elvira Singh
René M. J. C. Eijkemans
Gideon Nimako
Eustasius Musenge
A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
description A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, the increasing incidence of cancer has generated a high throughput of pathology reports. Hence, manual extraction and classification of information from these reports can be intrinsically complex and resource-intensive. This study aimed to (i) evaluate the quality of over 80,000 breast, colorectal, and prostate cancer free-text pathology reports and (ii) assess the effectiveness of random forest (RF) and variants of support vector machine (SVM) in the classification of reports into benign and malignant classes. The study approach comprises data preprocessing, visualisation, feature selections, text classification, and evaluation of performance metrics. The performance of the classifiers was evaluated across various feature sizes, which were jointly selected by four filter feature selection methods. The feature selection methods identified established clinical terms, which are synonymous with each of the three cancers. Uni-gram tokenisation using the classifiers showed that the predictive power of RF model was consistent across various feature sizes, with overall F-scores of 95.2%, 94.0%, and 95.3% for breast, colorectal, and prostate cancer classification, respectively. The radial SVM achieved better classification performance compared with its linear variant for most of the feature sizes. The classifiers also achieved high precision, recall, and accuracy. This study supports a nationally agreed standard in pathology reporting and the use of text mining for encoding, classifying, and production of high-quality information abstractions for cancer prognosis and research.
format article
author Okechinyere J. Achilonu
Victor Olago
Elvira Singh
René M. J. C. Eijkemans
Gideon Nimako
Eustasius Musenge
author_facet Okechinyere J. Achilonu
Victor Olago
Elvira Singh
René M. J. C. Eijkemans
Gideon Nimako
Eustasius Musenge
author_sort Okechinyere J. Achilonu
title A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
title_short A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
title_full A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
title_fullStr A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
title_full_unstemmed A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
title_sort text mining approach in the classification of free-text cancer pathology reports from the south african national health laboratory services
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/611d2a80aac04663b70ecec288748ed4
work_keys_str_mv AT okechinyerejachilonu atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT victorolago atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT elvirasingh atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT renemjceijkemans atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT gideonnimako atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT eustasiusmusenge atextminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT okechinyerejachilonu textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT victorolago textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT elvirasingh textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT renemjceijkemans textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT gideonnimako textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
AT eustasiusmusenge textminingapproachintheclassificationoffreetextcancerpathologyreportsfromthesouthafricannationalhealthlaboratoryservices
_version_ 1718411746372222976