Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7

The search for special and rare celestial objects has always played an important role in astronomy. Cataclysmic Variables (CVs) are special and rare binary systems with accretion disks. Most CVs are in the quiescent period, and their spectra have the emission lines of Balmer series, HeI, and HeII. A...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zhiyuan Hu, Jianyu Chen, Bin Jiang, Wenyu Wang
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/088ac8d3c2a343fe8b432d296f7ecc4b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:088ac8d3c2a343fe8b432d296f7ecc4b
record_format dspace
spelling oai:doaj.org-article:088ac8d3c2a343fe8b432d296f7ecc4b2021-11-25T19:09:47ZAutomatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR710.3390/universe71104382218-1997https://doaj.org/article/088ac8d3c2a343fe8b432d296f7ecc4b2021-11-01T00:00:00Zhttps://www.mdpi.com/2218-1997/7/11/438https://doaj.org/toc/2218-1997The search for special and rare celestial objects has always played an important role in astronomy. Cataclysmic Variables (CVs) are special and rare binary systems with accretion disks. Most CVs are in the quiescent period, and their spectra have the emission lines of Balmer series, HeI, and HeII. A few CVs in the outburst period have the absorption lines of Balmer series. Owing to the scarcity of numbers, expanding the spectral data of CVs is of positive significance for studying the formation of accretion disks and the evolution of binary star system models. At present, the research for astronomical spectra has entered the era of Big Data. The Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) has produced more than tens of millions of spectral data. the latest released LAMOST-DR7 includes 10.6 million low-resolution spectral data in 4926 sky regions, providing ideal data support for searching CV candidates. To process and analyze the massive amounts of spectral data, this study employed the Light Gradient Boosting Machine (LightGBM) algorithm, which is based on the ensemble tree model to automatically conduct the search in LAMOST-DR7. Finally, 225 CV candidates were found and four new CV candidates were verified by SIMBAD and published catalogs. This study also built the Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) models and used Accuracy, Precision, Recall, the F1-score, and the ROC curve to compare the four models comprehensively. Experimental results showed that LightGBM is more efficient. The search for CVs based on LightGBM not only enriches the existing CV spectral library, but also provides a reference for the data mining of other rare celestial objects in massive spectral data.Zhiyuan HuJianyu ChenBin JiangWenyu WangMDPI AGarticlesky surveycataclysmic variablesLightGBMdata miningElementary particle physicsQC793-793.5ENUniverse, Vol 7, Iss 438, p 438 (2021)
institution DOAJ
collection DOAJ
language EN
topic sky survey
cataclysmic variables
LightGBM
data mining
Elementary particle physics
QC793-793.5
spellingShingle sky survey
cataclysmic variables
LightGBM
data mining
Elementary particle physics
QC793-793.5
Zhiyuan Hu
Jianyu Chen
Bin Jiang
Wenyu Wang
Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
description The search for special and rare celestial objects has always played an important role in astronomy. Cataclysmic Variables (CVs) are special and rare binary systems with accretion disks. Most CVs are in the quiescent period, and their spectra have the emission lines of Balmer series, HeI, and HeII. A few CVs in the outburst period have the absorption lines of Balmer series. Owing to the scarcity of numbers, expanding the spectral data of CVs is of positive significance for studying the formation of accretion disks and the evolution of binary star system models. At present, the research for astronomical spectra has entered the era of Big Data. The Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) has produced more than tens of millions of spectral data. the latest released LAMOST-DR7 includes 10.6 million low-resolution spectral data in 4926 sky regions, providing ideal data support for searching CV candidates. To process and analyze the massive amounts of spectral data, this study employed the Light Gradient Boosting Machine (LightGBM) algorithm, which is based on the ensemble tree model to automatically conduct the search in LAMOST-DR7. Finally, 225 CV candidates were found and four new CV candidates were verified by SIMBAD and published catalogs. This study also built the Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) models and used Accuracy, Precision, Recall, the F1-score, and the ROC curve to compare the four models comprehensively. Experimental results showed that LightGBM is more efficient. The search for CVs based on LightGBM not only enriches the existing CV spectral library, but also provides a reference for the data mining of other rare celestial objects in massive spectral data.
format article
author Zhiyuan Hu
Jianyu Chen
Bin Jiang
Wenyu Wang
author_facet Zhiyuan Hu
Jianyu Chen
Bin Jiang
Wenyu Wang
author_sort Zhiyuan Hu
title Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
title_short Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
title_full Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
title_fullStr Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
title_full_unstemmed Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7
title_sort automatic search of cataclysmic variables based on lightgbm in lamost-dr7
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/088ac8d3c2a343fe8b432d296f7ecc4b
work_keys_str_mv AT zhiyuanhu automaticsearchofcataclysmicvariablesbasedonlightgbminlamostdr7
AT jianyuchen automaticsearchofcataclysmicvariablesbasedonlightgbminlamostdr7
AT binjiang automaticsearchofcataclysmicvariablesbasedonlightgbminlamostdr7
AT wenyuwang automaticsearchofcataclysmicvariablesbasedonlightgbminlamostdr7
_version_ 1718410195910000640