Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this st...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Bi-Qing Li, Kai-Yan Feng, Lei Chen, Tao Huang, Yu-Dong Cai
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/e8eab8813c9f453a958baa213f02f01d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e8eab8813c9f453a958baa213f02f01d
record_format dspace
spelling oai:doaj.org-article:e8eab8813c9f453a958baa213f02f01d2021-11-18T07:07:27ZPrediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.1932-620310.1371/journal.pone.0043927https://doaj.org/article/e8eab8813c9f453a958baa213f02f01d2012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22937126/?tool=EBIhttps://doaj.org/toc/1932-6203Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.Bi-Qing LiKai-Yan FengLei ChenTao HuangYu-Dong CaiPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 8, p e43927 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Bi-Qing Li
Kai-Yan Feng
Lei Chen
Tao Huang
Yu-Dong Cai
Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
description Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
format article
author Bi-Qing Li
Kai-Yan Feng
Lei Chen
Tao Huang
Yu-Dong Cai
author_facet Bi-Qing Li
Kai-Yan Feng
Lei Chen
Tao Huang
Yu-Dong Cai
author_sort Bi-Qing Li
title Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
title_short Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
title_full Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
title_fullStr Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
title_full_unstemmed Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
title_sort prediction of protein-protein interaction sites by random forest algorithm with mrmr and ifs.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/e8eab8813c9f453a958baa213f02f01d
work_keys_str_mv AT biqingli predictionofproteinproteininteractionsitesbyrandomforestalgorithmwithmrmrandifs
AT kaiyanfeng predictionofproteinproteininteractionsitesbyrandomforestalgorithmwithmrmrandifs
AT leichen predictionofproteinproteininteractionsitesbyrandomforestalgorithmwithmrmrandifs
AT taohuang predictionofproteinproteininteractionsitesbyrandomforestalgorithmwithmrmrandifs
AT yudongcai predictionofproteinproteininteractionsitesbyrandomforestalgorithmwithmrmrandifs
_version_ 1718423950279573504