Alignment-free classification of COI DNA barcode data with the Python package Alfie

Characterization of biodiversity from environmental DNA samples and bulk metabarcoding data is hampered by off-target sequences that can confound conclusions about a taxonomic group of interest. Existing methods for isolation of target sequences rely on alignment to existing referenc...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cameron M. Nugent, Sarah J. Adamowicz
Formato: article
Lenguaje:EN
Publicado: Pensoft Publishers 2020
Materias:
Acceso en línea:https://doaj.org/article/16decd04a40640b6a013b95e42f93d30
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:16decd04a40640b6a013b95e42f93d30
record_format dspace
spelling oai:doaj.org-article:16decd04a40640b6a013b95e42f93d302021-12-02T11:16:57ZAlignment-free classification of COI DNA barcode data with the Python package Alfie10.3897/mbmg.4.558152534-9708https://doaj.org/article/16decd04a40640b6a013b95e42f93d302020-09-01T00:00:00Zhttps://mbmg.pensoft.net/article/55815/download/pdf/https://mbmg.pensoft.net/article/55815/download/xml/https://mbmg.pensoft.net/article/55815/https://doaj.org/toc/2534-9708 Characterization of biodiversity from environmental DNA samples and bulk metabarcoding data is hampered by off-target sequences that can confound conclusions about a taxonomic group of interest. Existing methods for isolation of target sequences rely on alignment to existing reference barcodes, but this can bias results against novel genetic variants. Effectively parsing targeted DNA barcode data from off-target noise improves the quality of biodiversity estimates and biological conclusions by limiting subsequent analyses to a relevant subset of available data. Here, we present Alfie, a Python package for the alignment-free classification of cytochrome c oxidase subunit I (COI) DNA barcode sequences to taxonomic kingdoms. The package determines k-mer frequencies of DNA sequences, and the frequencies serve as input for a neural network classifier that was trained and tested using ~58,000 publicly available COI sequences. The classifier was designed and optimized through a series of tests that allowed for the optimal set of DNA k-mer features and optimal machine learning algorithm to be selected. The neural network classifier rapidly assigns COI sequences of varying lengths to kingdoms with greater than 99% accuracy and is shown to generalize effectively and make accurate predictions about data from previously unseen taxonomic classes. The package contains an application programming interface that allows the Alfie package’s functionality to be extended to different DNA sequence classification tasks to suit a user’s need, including classification of different genes and barcodes, and classification to different taxonomic levels. Alfie is free and publicly available through GitHub (https://github.com/CNuge/alfie) and the Python package index (https://pypi.org/project/alfie/). Cameron M. NugentSarah J. AdamowiczPensoft PublishersarticleEcologyQH540-549.5ENMetabarcoding and Metagenomics, Vol 4, Iss , Pp 81-89 (2020)
institution DOAJ
collection DOAJ
language EN
topic Ecology
QH540-549.5
spellingShingle Ecology
QH540-549.5
Cameron M. Nugent
Sarah J. Adamowicz
Alignment-free classification of COI DNA barcode data with the Python package Alfie
description Characterization of biodiversity from environmental DNA samples and bulk metabarcoding data is hampered by off-target sequences that can confound conclusions about a taxonomic group of interest. Existing methods for isolation of target sequences rely on alignment to existing reference barcodes, but this can bias results against novel genetic variants. Effectively parsing targeted DNA barcode data from off-target noise improves the quality of biodiversity estimates and biological conclusions by limiting subsequent analyses to a relevant subset of available data. Here, we present Alfie, a Python package for the alignment-free classification of cytochrome c oxidase subunit I (COI) DNA barcode sequences to taxonomic kingdoms. The package determines k-mer frequencies of DNA sequences, and the frequencies serve as input for a neural network classifier that was trained and tested using ~58,000 publicly available COI sequences. The classifier was designed and optimized through a series of tests that allowed for the optimal set of DNA k-mer features and optimal machine learning algorithm to be selected. The neural network classifier rapidly assigns COI sequences of varying lengths to kingdoms with greater than 99% accuracy and is shown to generalize effectively and make accurate predictions about data from previously unseen taxonomic classes. The package contains an application programming interface that allows the Alfie package’s functionality to be extended to different DNA sequence classification tasks to suit a user’s need, including classification of different genes and barcodes, and classification to different taxonomic levels. Alfie is free and publicly available through GitHub (https://github.com/CNuge/alfie) and the Python package index (https://pypi.org/project/alfie/).
format article
author Cameron M. Nugent
Sarah J. Adamowicz
author_facet Cameron M. Nugent
Sarah J. Adamowicz
author_sort Cameron M. Nugent
title Alignment-free classification of COI DNA barcode data with the Python package Alfie
title_short Alignment-free classification of COI DNA barcode data with the Python package Alfie
title_full Alignment-free classification of COI DNA barcode data with the Python package Alfie
title_fullStr Alignment-free classification of COI DNA barcode data with the Python package Alfie
title_full_unstemmed Alignment-free classification of COI DNA barcode data with the Python package Alfie
title_sort alignment-free classification of coi dna barcode data with the python package alfie
publisher Pensoft Publishers
publishDate 2020
url https://doaj.org/article/16decd04a40640b6a013b95e42f93d30
work_keys_str_mv AT cameronmnugent alignmentfreeclassificationofcoidnabarcodedatawiththepythonpackagealfie
AT sarahjadamowicz alignmentfreeclassificationofcoidnabarcodedatawiththepythonpackagealfie
_version_ 1718396039274168320