iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree

Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yunyun Liang, Shengli Zhang, Huijuan Qiao, Yinan Cheng
Formato: article
Lenguaje:EN
Publicado: AIMS Press 2021
Materias:
Acceso en línea:https://doaj.org/article/16255e1ecfdb4bcc9d4ab534177f7a41
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:16255e1ecfdb4bcc9d4ab534177f7a41
record_format dspace
spelling oai:doaj.org-article:16255e1ecfdb4bcc9d4ab534177f7a412021-11-29T02:37:31ZiEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree10.3934/mbe.20214341551-0018https://doaj.org/article/16255e1ecfdb4bcc9d4ab534177f7a412021-10-01T00:00:00Zhttps://www.aimspress.com/article/doi/10.3934/mbe.2021434?viewType=HTMLhttps://doaj.org/toc/1551-0018Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free scattering. In addition, it has been proved that genetic variation in enhancers is related to human diseases. Therefore, identification of enhancers and their strength has important biological meaning. In this paper, a novel model named iEnhancer-MFGBDT is developed to identify enhancer and their strength by fusing multiple features and gradient boosting decision tree (GBDT). Multiple features include k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix. Then we use GBDT to select features and perform classification successively. The accuracies reach 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, respectively. Compared with other models, the results show that our model is useful and effective intelligent tool to identify enhancers and their strength, of which the datasets and source codes are available at https://github.com/shengli0201/iEnhancer-MFGBDT1.Yunyun LiangShengli ZhangHuijuan QiaoYinan ChengAIMS Pressarticleidentificationenhancersmultiple featuresgradient boosting decision treeBiotechnologyTP248.13-248.65MathematicsQA1-939ENMathematical Biosciences and Engineering, Vol 18, Iss 6, Pp 8797-8814 (2021)
institution DOAJ
collection DOAJ
language EN
topic identification
enhancers
multiple features
gradient boosting decision tree
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
spellingShingle identification
enhancers
multiple features
gradient boosting decision tree
Biotechnology
TP248.13-248.65
Mathematics
QA1-939
Yunyun Liang
Shengli Zhang
Huijuan Qiao
Yinan Cheng
iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
description Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free scattering. In addition, it has been proved that genetic variation in enhancers is related to human diseases. Therefore, identification of enhancers and their strength has important biological meaning. In this paper, a novel model named iEnhancer-MFGBDT is developed to identify enhancer and their strength by fusing multiple features and gradient boosting decision tree (GBDT). Multiple features include k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix. Then we use GBDT to select features and perform classification successively. The accuracies reach 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, respectively. Compared with other models, the results show that our model is useful and effective intelligent tool to identify enhancers and their strength, of which the datasets and source codes are available at https://github.com/shengli0201/iEnhancer-MFGBDT1.
format article
author Yunyun Liang
Shengli Zhang
Huijuan Qiao
Yinan Cheng
author_facet Yunyun Liang
Shengli Zhang
Huijuan Qiao
Yinan Cheng
author_sort Yunyun Liang
title iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
title_short iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
title_full iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
title_fullStr iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
title_full_unstemmed iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
title_sort ienhancer-mfgbdt: identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree
publisher AIMS Press
publishDate 2021
url https://doaj.org/article/16255e1ecfdb4bcc9d4ab534177f7a41
work_keys_str_mv AT yunyunliang ienhancermfgbdtidentifyingenhancersandtheirstrengthbyfusingmultiplefeaturesandgradientboostingdecisiontree
AT shenglizhang ienhancermfgbdtidentifyingenhancersandtheirstrengthbyfusingmultiplefeaturesandgradientboostingdecisiontree
AT huijuanqiao ienhancermfgbdtidentifyingenhancersandtheirstrengthbyfusingmultiplefeaturesandgradientboostingdecisiontree
AT yinancheng ienhancermfgbdtidentifyingenhancersandtheirstrengthbyfusingmultiplefeaturesandgradientboostingdecisiontree
_version_ 1718407631930916864