Feature Selection Methods Based on Symmetric Uncertainty Coefficients and Independent Classification Information

Feature selection is a critical step in the data preprocessing phase in the field of pattern recognition and machine learning. The core of feature selection is to analyze and quantify the relevance, irrelevance, and redundancy between features and class labels. While existing feature selection metho...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Li Zhang, Xiaobo Chen
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/1897ab04eefd4547a0398a1341baba7e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Feature selection is a critical step in the data preprocessing phase in the field of pattern recognition and machine learning. The core of feature selection is to analyze and quantify the relevance, irrelevance, and redundancy between features and class labels. While existing feature selection methods give multiple explanations for these relationships, they ignore the multi-value bias of class-independent features and the redundancy of class dependent features. Therefore, a feature selection method (Maximal independent classification information and minimal redundancy, MICIMR) is proposed in this paper. Firstly, the relevance and redundancy terms of class independent characteristics are calculated respectively based on the symmetric uncertainty coefficient. Secondly, it calculates the relevance and redundancy terms of class-dependent features according to the independent classification information criterion. Finally, the selection criteria for these two characteristics are combined. To verify the effectiveness of the MICIMR algorithm, five feature selection methods are compared with the MICIMR algorithm on fifteen real datasets. The experimental results demonstrate that the MICIMR algorithm outperforms the other feature selection algorithms in terms of redundancy rate as well as classification accuracy (Gmean_macro and F1_macro).