Text classification to streamline online wildlife trade analyses.

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practition...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Oliver C Stringham, Stephanie Moncayo, Katherine G W Hill, Adam Toomes, Lewis Mitchell, Joshua V Ross, Phillip Cassey
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/ee7657fc034648dc94b59b6399e461e4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ee7657fc034648dc94b59b6399e461e4
record_format dspace
spelling oai:doaj.org-article:ee7657fc034648dc94b59b6399e461e42021-12-02T20:09:25ZText classification to streamline online wildlife trade analyses.1932-620310.1371/journal.pone.0254007https://doaj.org/article/ee7657fc034648dc94b59b6399e461e42021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0254007https://doaj.org/toc/1932-6203Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question 'how much data is required to have an adequately performing model?', we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.Oliver C StringhamStephanie MoncayoKatherine G W HillAdam ToomesLewis MitchellJoshua V RossPhillip CasseyPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 7, p e0254007 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Oliver C Stringham
Stephanie Moncayo
Katherine G W Hill
Adam Toomes
Lewis Mitchell
Joshua V Ross
Phillip Cassey
Text classification to streamline online wildlife trade analyses.
description Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question 'how much data is required to have an adequately performing model?', we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.
format article
author Oliver C Stringham
Stephanie Moncayo
Katherine G W Hill
Adam Toomes
Lewis Mitchell
Joshua V Ross
Phillip Cassey
author_facet Oliver C Stringham
Stephanie Moncayo
Katherine G W Hill
Adam Toomes
Lewis Mitchell
Joshua V Ross
Phillip Cassey
author_sort Oliver C Stringham
title Text classification to streamline online wildlife trade analyses.
title_short Text classification to streamline online wildlife trade analyses.
title_full Text classification to streamline online wildlife trade analyses.
title_fullStr Text classification to streamline online wildlife trade analyses.
title_full_unstemmed Text classification to streamline online wildlife trade analyses.
title_sort text classification to streamline online wildlife trade analyses.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/ee7657fc034648dc94b59b6399e461e4
work_keys_str_mv AT olivercstringham textclassificationtostreamlineonlinewildlifetradeanalyses
AT stephaniemoncayo textclassificationtostreamlineonlinewildlifetradeanalyses
AT katherinegwhill textclassificationtostreamlineonlinewildlifetradeanalyses
AT adamtoomes textclassificationtostreamlineonlinewildlifetradeanalyses
AT lewismitchell textclassificationtostreamlineonlinewildlifetradeanalyses
AT joshuavross textclassificationtostreamlineonlinewildlifetradeanalyses
AT phillipcassey textclassificationtostreamlineonlinewildlifetradeanalyses
_version_ 1718375097754976256