Predicting sunspot number from topological features in spectral images I: Machine learning approach

This study presents an advanced machine learning approach to predict the number of sunspots using a comprehensive dataset derived from solar images provided by the Solar and Heliospheric Observatory (SOHO). The dataset encompasses various spectral bands, capturing the complex dynamics of solar activ...

Full description

Autores:
Sierra Porta, David
Tarazona Alvarado, Miguel
Herrera Acevedo, Daniel
Tipo de recurso:
Fecha de publicación:
2024
Institución:
Universidad Tecnológica de Bolívar
Repositorio:
Repositorio Institucional UTB
Idioma:
eng
OAI Identifier:
oai:repositorio.utb.edu.co:20.500.12585/12701
Acceso en línea:
https://hdl.handle.net/20.500.12585/12701
Palabra clave:
Machine learning
Sunspots prediction
Spectral images
Sun’s dynamics
Fractal features
LEMB
Rights
openAccess
License
http://creativecommons.org/publicdomain/zero/1.0/
Description
Summary:This study presents an advanced machine learning approach to predict the number of sunspots using a comprehensive dataset derived from solar images provided by the Solar and Heliospheric Observatory (SOHO). The dataset encompasses various spectral bands, capturing the complex dynamics of solar activity and facilitating interdisciplinary analyses with other solar phenomena. We employed five machine learning models: Random Forest Regressor, Gradient Boosting Regressor, Extra Trees Regressor, Ada Boost Regressor, and Hist Gradient Boosting Regressor, to predict sunspot numbers. These models utilized four key heliospheric variables — Proton Density, Temperature, Bulk Flow Speed and Interplanetary Magnetic Field (IMF) — alongside 14 newly introduced topological variables. These topological features were extracted from solar images using different filters, including HMIIGR, HMIMAG, EIT171, EIT195, EIT284, and EIT304. In total, 60 models were constructed, both incorporating and excluding the topological variables. Our analysis reveals that models incorporating the topological variables achieved significantly higher accuracy, with the r2-score improving from approximately 0.30 to 0.93 on average. The Extra Trees Regressor (ET) emerged as the best-performing model, demonstrating superior predictive capabilities across all datasets. These results underscore the potential of combining machine learning models with additional topological features from spectral analysis, offering deeper insights into the complex dynamics of solar activity and enhancing the precision of sunspot number predictions. This approach provides a novel methodology for improving space weather forecasting and contributes to a more comprehensive understanding of solar-terrestrial interactions.