Automatic light curve processing for exoplanet identification using machine learning algorithms
Keywords:
astronomy. exoplanet. machine learning. light curve.Abstract
approaches based on machine learning techniques have been proposed in the literature to assist in the detection of exoplanets through automated processing of light curves. Despite advancements, traditional machine learning algorithms have not yet been fully studied for this task. Therefore, in this work, we proposed the definition of a baseline through an extensive experimental evaluation involving 16 algorithms with different parameter settings. To achieve this goal, in this study, data from the Kepler telescope was used, totaling 5302 light curves, each with 60000 records. As the main result of the experimental evaluation, the LightGBM algorithm showed the best performance, with an accuracy rate of 82.92%.
Downloads
References
ARMSTRONG, D. J.; GAMPER, J.; DAMOULAS, T. Exoplanet validation with machine learning: 50 new validated Kepler planets. Monthly Notices of the Royal Astronomical Society, Oxford, v. 504, n. 4, p. 5327-5344, ago. 2020. Disponível em: <https://doi.org/10.1093/mnras/staa2498>. Acesso em 28 out. de 2023.
ARMSTRONG, D. J.; POLLACCO, D.; SANTERNE, A. Transit shapes and self-organizing maps as a tool for ranking planetary candidates: application to Kepler and K2. Monthly Notices of the Royal Astronomical Society, Oxford, v. 465, n. 3, p. 2634-2642, nov. 2016. Disponível em: <https://doi.org/10.1093/mnras/stw2881>. Acesso em 28 out. de 2023.
BABU, G. J.; MAHABAL, A. Skysurveys, Light Curves and Statistical Challenges. International Statistical Review, New Jersey, v. 84, n. 3, p. 506-527, 2016. Disponível em: <https://doi.org/10.1111/insr.12118>. Acesso em 28 out. de 2023.
BLOMME, J. Variable star data mining techniques for time-resolved databases. Leuven, 2012. Tese (Doutorado em Ciências) - Katholieke Universiteit Leuven, 2012. Disponível em: <https://fys.kuleuven.be/ster/pub/thesis-jonas-blomme/phd-jonas-blomme>. Acesso em 28 out. de 2023.
BREIMAN, L. Random Forests. Machine Learning, Berlin, v. 45, n. 1, p. 5-32, out. 2001. Disponível em: <https://doi.org/10.1023/A:1010933404324>. Acesso em 28 out. de 2023.
CASTRILLÓN, J. P. B. Corot light curves analysis using different comparative processes: estimating stellar rotation periods. Dissertação (Mestrado em Física). Universidade Federal do Rio Grande do Norte, Natal, 2010. Disponível em: <https://repositorio.ufrn.br/handle/123456789/16564>. Acesso em 28 out. de 2023.
CAWLEY, G. C.; TALBOT, N. L. C. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research, Brookline, v. 11, n. 70, p. 2079-2107, 2010. Disponível em: <http://jmlr.org/papers/v11/cawley10a.html>. Acesso em 28 out. de 2023.
CHEN, T.; GUESTRIN, C. XGBoost: A Scalable Tree Boosting System. In: 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, San Francisco, California, USA. Proceedings […] New York, NY, USA: Association for Computing Machinery, 2016. p. 785-794. Disponível em: <https://doi.org/10.1145/2939672.2939785>. Acesso em 28 out. de 2023.
CORTES, C.; VAPNIK, V. Support-vector networks. Machine Learning, Berlin, v. 20, n. 3, p. 273-297, set. 1995. Disponível em: <https://doi.org/10.1007/BF00994018>. Acesso em 28 out. de 2023.
COUGHLIN, Jeffrey L. et al. Planetary candidates observed by kepler. vii. the first fully uniform catalog based on the entire 48-month data set (q1–q17 dr24). The Astrophysical Journal Supplement Series, Bristol, v. 224, n. 1, p. 12, maio 2016. Disponível em: <https://dx.doi.org/10.3847/0067-0049/224/1/12>.Acesso em 28 out. de 2023.
COVER, T.; HART, P. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, New York, v. 13, n. 1, p. 21-27, 1967. Disponível em: <https://doi.org/10.1109/TIT.1967.1053964>. Acesso em 28 out. de 2023.
CUÉLLAR, S.; GRANADOS, P.; FABREGAS, E.; CURÉ, M.; VARGAS, H.; DORMIDO-CANTO, S.; FARÍAS, G. Deep Learning Exoplanets Detection by Combining Real and Synthetic Data. Preprints.org, Basel, 2021. Preprint. Disponível em: <https://doi.org/10.20944/preprints202112.0070.v1>. Acesso em 28 out. de 2023.
EXOPLANETS NASA. Discovery Fast Facts. 2020. Disponível em: <https://exoplanets.nasa.gov/discovery/missions/#otp_fast_facts>. Acesso em 28 out. de 2023.
FACELI, K. et al. Inteligência artificial: uma abordagem de aprendizado de máquina. 2. ed. Rio de Janeiro: Editora LTC, 2021.
FAYYAD, U.; PIATETSKY-SHAPIRO, G.; SMYTH, P. Knowledge discovery and data mining: towards a unifying framework. In: SECOND INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 1996, Portland, Oregon. Proceedings […] Portland, Oregon: AAAI Press, 1996. p. 82-88. Disponível em: <https://dl.acm.org/doi/10.5555/3001460.3001477>. Acesso em 28 out. de 2023.
FREUND, Y.; SCHAPIRE, R. E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, Amsterdam, v. 55, n. 1, p. 119-139, 1997. Disponível em: <https://doi.org/10.1006/jcss.1997.1504>.Acesso em 28 out. de 2023.
GEURTS, P.; ERNST, D.; WEHENKEL, L. Extremely randomized trees. Machine Learning, Berlin, v. 63, n. 1, p. 3-42, abr. 2006. Disponível em: <https://doi.org/10.1007/s10994-006-6226-1>. Acesso em 28 out. de 2023.
HAYKIN, S. Redes Neurais: Princípios e Prática. 2. ed. Porto Alegre: Bookman, 2001.
HASTIE, T.; TIBSHIRANI, R.; FRIEDMAN, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. 2. ed. New York, NY: Springer, 2009. Disponível em: <https://doi.org/10.1007/978-0-387-84858-7>. Acesso em 28 out. de 2023.
HINNERS, T. A.; TAT, K.; THORP, R. Machine Learning Techniques for Stellar Light Curve Classification. The Astronomical Journal, Bristol, v. 156, n. 1, p. 7, jun. 2018. Disponível em: <https://dx.doi.org/10.3847/1538-3881/aac16d>. Acesso em 28 out. de 2023.
HOERL, A. E.; KENNARD, R. W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, Boston, v. 12, n. 1, p. 55-67, 1970. Disponível em: <https://doi.org/10.1080/00401706.1970.10488634>. Acesso em 28 out. de 2023.
IMAGINE NASA. Timing Analysis. 2013. Disponível em: <https://imagine.gsfc.nasa.gov/science/toolbox/timing2.html>. Acesso em: 28 out. 2023.
JAMES, G.; WITTEN, D.; HASTIE, T.; TIBSHIRANI, R. An Introduction to Statistical Learning: with Applications in R. 1. ed. New York, NY: Springer, 2013. Disponível em: <https://doi.org/10.1007/978-1-4614-7138-7>. Acesso em 28 mar. de 2023.
JARA-MALDONADO, M. et al. Transiting Exoplanet Discovery Using Machine Learning Techniques: A Survey. Earth Science Informatics, Berlin, v. 13, n. 3, p. 573-600, 2020. Disponível em: <https://doi.org/10.1007/s12145-020-00464-7>. Acesso em 28 mar. de 2023.
KE, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In: INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS, 31., 2017, Long Beach, California, USA. Proceedings [...] Red Hook, NY, USA: Curran Associates Inc., 2017. p. 3149-3157. Disponível em: <https://doi.org/10.5555/3294996.3295074>. Acesso em 28 mar. de 2023.
KUHN, M.; JOHNSON, K. Applied Predictive Modeling. 1. ed. New York, NY: Springer, 2013. XIII, 600 p. ISBN: 978-1-4614-6849-3. Disponível em: <https://doi.org/10.1007/978-1-4614-6849-3>. Acesso em 28 mar. de 2023.
CARDOSO, J. V. d. M.; HEDGES, C.; GULLY-SANTIAGO, M.; SAUNDERS, N.; CODY, A. M.; BARCLAY, T.; HALL, O.; SAGEAR, S.; TURTELBOOM, E.; ZHANG, J.; TZANIDAKIS, A.; MIGHELL, K.; COUGHLIN, J.; BELL, K.; BERTA-THOMPSON, Z.; WILLIAMS, P.; DOTSON, J.; BARENTSEN, G. Lightkurve: Kepler and TESS time series analysis in Python. Astrophysics Source Code Library, 2018. Disponível em: <http://adsabs.harvard.edu/abs/2018ascl.soft12013L>. Acesso em: Acesso em 28 mar. de 2023.
MALIK, A.; MOSTER, B. P.; OBERMEIER, C. Exoplanet detection using machine learning. Monthly Notices of the Royal Astronomical Society, Oxford, v. 513, n. 4, p. 5505-5516, 2021. Disponível em: <https://doi.org/10.1093/mnras/stab3692>. Acesso em: 4 mar. 2024.
MCCAULIFF, S. D.; JENKINS, J. M.; CATANZARITE, J.; BURKE, C. J.; COUGHLIN, J. L.; TWICKEN, J. D.; TENENBAUM, P.; SEADER, S.; LI, J.; COTE, M. Automatic Classification of Kepler Planetary Transit Candidates. The Astrophysical Journal, Bristol, v. 806, n. 1, p. 6, 2015. Disponível em: <https://dx.doi.org/10.1088/0004-637X/806/1/6>. Acesso em: Acesso em 28 mar. de 2023.
MITCHELL, T. M. Machine Learning. Boston: McGraw-Hill, 1997.
MONTANGER, P. O.; ZALEWSKI, W. Classificação automática de objetos astronômicos por meio da análise de séries temporais. Revista Brasileira de Iniciação Científica, Itapetininga, v.6, n.4, p.42-55, 2019. Edição Especial Universidade Federal da Integração Latino-Americana (UNILA). Disponível em <https://periodicos.itp.ifsp.edu.br/index.php/IC/article/view/1538>.Acesso em: 04 mar. 2024.
MONTANGER, P. O.; ZALEWSKI, W. Programa computacional para a identificação automática de exoplanetas. Revista Brasileira de Iniciação Científica, Itapetininga, 2020. Disponível em: <https://periodicos.itp.ifsp.edu.br/index.php/IC/article/view/1736>. Acesso em: 08 jan. 2021.
NADEAU, C.; BENGIO, Y. Inference for the Generalization Error. Machine Learning, Berlin, v. 52, n. 3, p. 239-281, 2003. Disponível em: <https://doi.org/10.1023/A:1024068626366>. Acesso em: Acesso em 28 mar. de 2023.
QUINLAN, J. R. Induction of decision trees. Machine Learning, Berlin, v. 1, n. 1, p. 81-106, 1986. Disponível em: <https://doi.org/10.1007/BF00116251>. Acesso em: 10 jul. 2023.
PROKHORENKOVA, L.; GUSEV, G.; VOROBEV, A.; DOROGUSH, A. V.; GULIN, A. CatBoost: unbiased boosting with categorical features. In: INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS, 32., 2018, Montréal, Canada. Proceedings [...] Red Hook, NY, USA: Curran Associates Inc., 2018. p. 6639-6649. Disponível em: <https://doi.org/10.5555/3327757.3327770>. Acesso em: 15 out. 2021.
RASMUSSEN, C. E.; WILLIAMS, C. K. I. Gaussian Processes for Machine Learning. Cambridge: The MIT Press, 2005. ISBN: 9780262256834. Disponível em: <https://doi.org/10.7551/mitpress/3206.001.0001>. Acesso em: 15 mar. 2021.
REZENDE, S. O. Sistemas Inteligentes: Fundamentos e Aplicações. Barueri: Editora Manole, 2003.
RICHARDS, J. W.; STARR, D. L.; BUTLER, N. R.; BLOOM, J. S.; BREWER, J. M.; CRELLIN-QUICK, A.; HIGGINS, J.; KENNEDY, R.; RISCHARD, M. On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal, Bristol, v. 733, n. 1, p. 10, 2011. Disponível em: <https://dx.doi.org/10.1088/0004-637X/733/1/10>. Acesso em: 15 mar. 2021.
RUSSELL, S.; NORVIG, P. Inteligência Artificial: Uma Abordagem Moderna. 3. ed. Rio de Janeiro: Pearson, 2013.
SHALLUE, C. J.; VANDERBURG, A. Identifying Exoplanets with Deep Learning: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90. The Astronomical Journal, Bristol, v. 155, 2017. Disponível em: <https://api.semanticscholar.org/CorpusID:4535051>. Acesso em: 15 mar. 2021.
SHALLUE, C. J.; VANDERBURG, A. Identification of planetary transits using deep neural networks. Astronomy Magazine, Waukesha, v. 23, n. 4, p. 45-58, 2018. Disponível em: <https://lweb.cfa.harvard.edu/~avanderb/kepler90i.pdf>. Acesso em: 28 mar. 2021.
SILVA, D. F.; SOUZA, V. M. A.; BATISTA, G. E. A. P. A. Time Series Classification Using Compression Distance of Recurrence Plots. In: IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING, 2013, Dallas, TX, USA. Proceedings [...] Dallas, TX, USA: IEEE, 2013. p. 687-696. Disponível em: <https://api.semanticscholar.org/CorpusID:6008338>. Acesso em: 28 mar. 2022.
SNOEK, J.; LAROCHELLE, H.; ADAMS, R. P. Practical Bayesian Optimization of Machine Learning Algorithms. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 25, 2012, Lake Tahoe, NV, United States. Proceedings [...] Lake Tahoe, NV: [s.n.], 2012. v. 4. p. 2951-2959. Disponível em: <https://papers.nips.cc/paper_files/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf>. Acesso em: 28 mar. 2022.
SOUZA, A. A. de; VALIO, A. Estudo da atividade estelar da Kepler-289 a partir da modelagem de trânsitos planetários. Revista Brasileira de Ensino de Física, São Paulo, v. 41, n. 4, 2019. Disponível em: <https://doi.org/10.1590/1806-9126-RBEF-2018-0323>. Acesso em: 28 mar. 2022.
VISSER, K.; BOSMA, B.; POSTMA, E. A one-armed CNN for exoplanet detection from light curves. ArXiv, New York, 2021. Preprint. Disponível em: <https://arxiv.org/abs/2105.06292>. Acesso em: 20 mar. de 2022.
VISSER, K.; BOSMA, B.; POSTMA, E. Size does matter: Exoplanet detection with a sparse convolutional neural network. Astronomy and Computing, Amsterdam, v. 41, p. 100654, 2022. ISSN 2213-1337. Disponível em: <https://doi.org/10.1016/j.ascom.2022.100654>. Acesso em: 20 mar. de 2022.
VISSER, K.; BOSMA, B.; POSTMA, E. Exoplanet detection with Genesis. Journal of Astronomical Instrumentation, Singapore, v. 11, n. 03, p. 2250011, 2022. Disponível em: <https://doi.org/10.1142/S2251171722500118>. Acesso em: 20 mar. de 2022.
ZALEWSKI, W. Modelagem Simbólica de Padrões Morfológicos para a Classificação de Séries Temporais. Tese (Doutorado em Ciência da Computação). Universidade Federal do Paraná, Curitiba, 2015. Disponível em: <http://hdl.handle.net/1884/41324>. Acesso em: 28 mar. 2020.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Revista Brasileira de Iniciação Científica
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.