Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks

Estimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Moritz Kohls, Magdalena Kircher, Jessica Krepel, Pamela Liebig, Klaus Jung
Format:	article
Langue:	EN
Publié:	MDPI AG 2021
Sujets:	artificial neural networks classification machine learning metagenomics next-generation sequencing viruses Genetics QH426-470
Accès en ligne:	https://doaj.org/article/50bbe16f0f614b6893a74c2fd88ca9aa
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:50bbe16f0f614b6893a74c2fd88ca9aa
record_format	dspace
spelling	oai:doaj.org-article:50bbe16f0f614b6893a74c2fd88ca9aa2021-11-25T17:41:34ZCorrecting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks10.3390/genes121117552073-4425https://doaj.org/article/50bbe16f0f614b6893a74c2fd88ca9aa2021-10-01T00:00:00Zhttps://www.mdpi.com/2073-4425/12/11/1755https://doaj.org/toc/2073-4425Estimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses whose reference sequences are not yet available in public databases. Instead of a mapping approach, and in order to classify sequencing reads at least to a taxonomic level, the performance of artificial neural networks and other machine learning models was studied. Taxonomic and genomic data from the NCBI database were used to sample labelled sequencing reads as training data. The fitted neural network was applied to classify unlabelled reads of simulated and real-world test sets. Additional auxiliary test sets of labelled reads were used to estimate the conditional class probabilities, and to correct the prior estimation of the taxonomic distribution in the actual test set. Among the taxonomic levels, the biological order of viruses provided the most comprehensive data base to generate training data. The prediction accuracy of the artificial neural network to classify test reads to their viral order was considerably higher than that of a random classification. Posterior estimation of taxa frequencies could correct the primary classification results.Moritz KohlsMagdalena KircherJessica KrepelPamela LiebigKlaus JungMDPI AGarticleartificial neural networksclassificationmachine learningmetagenomicsnext-generation sequencingvirusesGeneticsQH426-470ENGenes, Vol 12, Iss 1755, p 1755 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	artificial neural networks classification machine learning metagenomics next-generation sequencing viruses Genetics QH426-470
spellingShingle	artificial neural networks classification machine learning metagenomics next-generation sequencing viruses Genetics QH426-470 Moritz Kohls Magdalena Kircher Jessica Krepel Pamela Liebig Klaus Jung Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
description	Estimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses whose reference sequences are not yet available in public databases. Instead of a mapping approach, and in order to classify sequencing reads at least to a taxonomic level, the performance of artificial neural networks and other machine learning models was studied. Taxonomic and genomic data from the NCBI database were used to sample labelled sequencing reads as training data. The fitted neural network was applied to classify unlabelled reads of simulated and real-world test sets. Additional auxiliary test sets of labelled reads were used to estimate the conditional class probabilities, and to correct the prior estimation of the taxonomic distribution in the actual test set. Among the taxonomic levels, the biological order of viruses provided the most comprehensive data base to generate training data. The prediction accuracy of the artificial neural network to classify test reads to their viral order was considerably higher than that of a random classification. Posterior estimation of taxa frequencies could correct the primary classification results.
format	article
author	Moritz Kohls Magdalena Kircher Jessica Krepel Pamela Liebig Klaus Jung
author_facet	Moritz Kohls Magdalena Kircher Jessica Krepel Pamela Liebig Klaus Jung
author_sort	Moritz Kohls
title	Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
title_short	Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
title_full	Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
title_fullStr	Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
title_full_unstemmed	Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks
title_sort	correcting the estimation of viral taxa distributions in next-generation sequencing data after applying artificial neural networks
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/50bbe16f0f614b6893a74c2fd88ca9aa
work_keys_str_mv	AT moritzkohls correctingtheestimationofviraltaxadistributionsinnextgenerationsequencingdataafterapplyingartificialneuralnetworks AT magdalenakircher correctingtheestimationofviraltaxadistributionsinnextgenerationsequencingdataafterapplyingartificialneuralnetworks AT jessicakrepel correctingtheestimationofviraltaxadistributionsinnextgenerationsequencingdataafterapplyingartificialneuralnetworks AT pamelaliebig correctingtheestimationofviraltaxadistributionsinnextgenerationsequencingdataafterapplyingartificialneuralnetworks AT klausjung correctingtheestimationofviraltaxadistributionsinnextgenerationsequencingdataafterapplyingartificialneuralnetworks
_version_	1718412102996066304

Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks

Documents similaires