On the information hidden in a classifier distribution

Abstract Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifier...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Farrokh Habibzadeh, Parham Habibzadeh, Mahboobeh Yadollahie, Hooman Roozbehi
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/6337f968ed55456f97c7e88bd7135a15
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:6337f968ed55456f97c7e88bd7135a15
record_format dspace
spelling oai:doaj.org-article:6337f968ed55456f97c7e88bd7135a152021-12-02T14:12:10ZOn the information hidden in a classifier distribution10.1038/s41598-020-79548-92045-2322https://doaj.org/article/6337f968ed55456f97c7e88bd7135a152021-01-01T00:00:00Zhttps://doi.org/10.1038/s41598-020-79548-9https://doaj.org/toc/2045-2322Abstract Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. Typically, several studies should be conducted to find all these indices. Herein, we show that they already exist, hidden in the distribution of the variable used to classify, and can readily be harvested. An educated guess about the distribution of the variable used to classify in each class would help us to decompose the frequency distribution of the variable in population into its components—the probability density function of the variable in each class. Based on the harvested parameters, we can then calculate the performance indices of the classifier. As a case study, we applied the technique to the relative frequency distribution of prostate-specific antigen, a biomarker commonly used in medicine for the diagnosis of prostate cancer. We used nonlinear curve fitting to decompose the variable relative frequency distribution into the probability density functions of the non-diseased and diseased people. The functions were then used to determine the performance indices of the classifier. Sensitivity, specificity, the most appropriate cut-off value, and likelihood ratios were calculated. The reference range of the biomarker and the prevalence of prostate cancer for various age groups were also calculated. The indices obtained were in good agreement with the values reported in previous studies. All these were done without being aware of the real health status of the individuals studied. The method is even applicable for conditions with no definite definitions (e.g., hypertension). We believe the method has a wide range of applications in many scientific fields.Farrokh HabibzadehParham HabibzadehMahboobeh YadollahieHooman RoozbehiNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Farrokh Habibzadeh
Parham Habibzadeh
Mahboobeh Yadollahie
Hooman Roozbehi
On the information hidden in a classifier distribution
description Abstract Classification tasks are a common challenge to every field of science. To correctly interpret the results provided by a classifier, we need to know the performance indices of the classifier including its sensitivity, specificity, the most appropriate cut-off value (for continuous classifiers), etc. Typically, several studies should be conducted to find all these indices. Herein, we show that they already exist, hidden in the distribution of the variable used to classify, and can readily be harvested. An educated guess about the distribution of the variable used to classify in each class would help us to decompose the frequency distribution of the variable in population into its components—the probability density function of the variable in each class. Based on the harvested parameters, we can then calculate the performance indices of the classifier. As a case study, we applied the technique to the relative frequency distribution of prostate-specific antigen, a biomarker commonly used in medicine for the diagnosis of prostate cancer. We used nonlinear curve fitting to decompose the variable relative frequency distribution into the probability density functions of the non-diseased and diseased people. The functions were then used to determine the performance indices of the classifier. Sensitivity, specificity, the most appropriate cut-off value, and likelihood ratios were calculated. The reference range of the biomarker and the prevalence of prostate cancer for various age groups were also calculated. The indices obtained were in good agreement with the values reported in previous studies. All these were done without being aware of the real health status of the individuals studied. The method is even applicable for conditions with no definite definitions (e.g., hypertension). We believe the method has a wide range of applications in many scientific fields.
format article
author Farrokh Habibzadeh
Parham Habibzadeh
Mahboobeh Yadollahie
Hooman Roozbehi
author_facet Farrokh Habibzadeh
Parham Habibzadeh
Mahboobeh Yadollahie
Hooman Roozbehi
author_sort Farrokh Habibzadeh
title On the information hidden in a classifier distribution
title_short On the information hidden in a classifier distribution
title_full On the information hidden in a classifier distribution
title_fullStr On the information hidden in a classifier distribution
title_full_unstemmed On the information hidden in a classifier distribution
title_sort on the information hidden in a classifier distribution
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/6337f968ed55456f97c7e88bd7135a15
work_keys_str_mv AT farrokhhabibzadeh ontheinformationhiddeninaclassifierdistribution
AT parhamhabibzadeh ontheinformationhiddeninaclassifierdistribution
AT mahboobehyadollahie ontheinformationhiddeninaclassifierdistribution
AT hoomanroozbehi ontheinformationhiddeninaclassifierdistribution
_version_ 1718391786068508672