Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method

Abstract Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices ref...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yuhua Yao, Xianhong Li, Bo Liao, Li Huang, Pingan He, Fayou Wang, Jiasheng Yang, Hailiang Sun, Yulong Zhao, Jialiang Yang
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/f8cb3b40e97f44c5835c20e89822d6b7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f8cb3b40e97f44c5835c20e89822d6b7
record_format dspace
spelling oai:doaj.org-article:f8cb3b40e97f44c5835c20e89822d6b72021-12-02T11:40:51ZPredicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method10.1038/s41598-017-01699-z2045-2322https://doaj.org/article/f8cb3b40e97f44c5835c20e89822d6b72017-05-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-01699-zhttps://doaj.org/toc/2045-2322Abstract Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.Yuhua YaoXianhong LiBo LiaoLi HuangPingan HeFayou WangJiasheng YangHailiang SunYulong ZhaoJialiang YangNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-10 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Yuhua Yao
Xianhong Li
Bo Liao
Li Huang
Pingan He
Fayou Wang
Jiasheng Yang
Hailiang Sun
Yulong Zhao
Jialiang Yang
Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
description Abstract Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.
format article
author Yuhua Yao
Xianhong Li
Bo Liao
Li Huang
Pingan He
Fayou Wang
Jiasheng Yang
Hailiang Sun
Yulong Zhao
Jialiang Yang
author_facet Yuhua Yao
Xianhong Li
Bo Liao
Li Huang
Pingan He
Fayou Wang
Jiasheng Yang
Hailiang Sun
Yulong Zhao
Jialiang Yang
author_sort Yuhua Yao
title Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_short Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_full Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_fullStr Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_full_unstemmed Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method
title_sort predicting influenza antigenicity from hemagglutintin sequence data based on a joint random forest method
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/f8cb3b40e97f44c5835c20e89822d6b7
work_keys_str_mv AT yuhuayao predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT xianhongli predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT boliao predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT lihuang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT pinganhe predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT fayouwang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT jiashengyang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT hailiangsun predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT yulongzhao predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
AT jialiangyang predictinginfluenzaantigenicityfromhemagglutintinsequencedatabasedonajointrandomforestmethod
_version_ 1718395558282919936