
Citation: | Li, Y. Y., Huang, S. Y., Xu, S. B., Yuan, Z. G., Jiang, K., Xiong, Q. Y., and Lin, R. T. (2025). Solar flare forecasting based on a Fusion Model. Earth Planet. Phys., 9(1), 171–181. DOI: 10.26464/epp2024058 |
Solar flare prediction is an important subject in the field of space weather. Deep learning technology has greatly promoted the development of this subject. In this study, we propose a novel solar flare forecasting model integrating Deep Residual Network (ResNet) and Support Vector Machine (SVM) for both ≥ C-class (C, M, and X classes) and ≥ M-class (M and X classes) flares. We collected samples of magnetograms from May 1, 2010 to September 13, 2018 from Space-weather Helioseismic and Magnetic Imager (HMI) Active Region Patches and then used a cross-validation method to obtain seven independent data sets. We then utilized five metrics to evaluate our fusion model, based on intermediate-output extracted by ResNet and SVM using the Gaussian kernel function. Our results show that the primary metric true skill statistics (TSS) achieves a value of 0.708 ± 0.027 for ≥ C-class prediction, and of 0.758 ± 0.042 for ≥ M-class prediction; these values indicate that our approach performs significantly better than those of previous studies. The metrics of our fusion model’s performance on the seven datasets indicate that the model is quite stable and robust, suggesting that fusion models that integrate an excellent baseline network with SVM can achieve improved performance in solar flare prediction. Besides, we also discuss the performance impact of architectural innovation in our fusion model.
A solar flare is a phenomenon in which a portion of the solar atmosphere suddenly releases huge quantities of energy due to magnetic reconnection (Priest and Forbes, 2002). Solar flares, together with the coronal mass ejections, usually blast from the solar active regions (ARs), areas with especially strong magnetic fields, compared to the Sun’s average magnetic field. Intense solar activity leads to instantaneous localized heating of gas that sends large numbers of energetic particles into interplanetary space, resulting in increased ionization of Earth’s upper atmosphere, affecting the GPS positioning system and short-wave communications, and threatening the safety of astronauts and the integrity of satellites and their instruments (Wheatland, 2005; Huang X et al., 2018). Thus, the modeling and predicting of solar flares has become an important subject in the field of space weather.
In early studies, many flare prediction models were built based on statistical methods. Bornmann and Shaw (1994) discussed how McIntosh classification parameters, based on multiple linear regression analysis, could be applied to solar flares. Gallagher et al. (2002) developed a Poisson statistical model for monitoring solar active regions and predict flares. Song H et al. (2009) proposed a solar flare prediction model using an ordinal logistic regression method. Mason and Hoeksema (2010) analyzed correlations between different parameters and flare events and found a statistical relationship between the solar magnetic field and flares, based on a superposed epoch analysis method. Bloomfield et al. (2012) investigated the performance of Poisson probabilities in predicting X-ray flares by adopting a performance measure named true skill statistics (TSS).
In the past few decades, machine learning technology has developed rapidly and has been applied in many fields of space weather (e.g., Xu SB et al., 2020; Li YY et al., 2022), especially in solar flare prediction (e.g., Camporeale, 2019). Bradshaw et al. (1988) constructed a three-layer back-propagation connectionist network to forecast solar flares. Wang HN et al. (2009) proposed an artificial neural network model using solar magnetic field measurements as inputs, and reported a 69% ratio of correct flare forecasts. Since then, a series of additional algorithms, including random forest, ensemble learning, ordinal logistic regression, support vector machine, and lasso methods, have been used to build forecast models (e.g., Yuan Y et al., 2010; Bobra and Couvidat, 2015; Guerra et al., 2015; Nishizuka et al., 2017; Liu C et al., 2017; Sadykov et al., 2017; Campi et al., 2019).
Widespread availability of GPUs with expanded parallel computing capabilities has allowed deep learning technology to make significant progress. Huang X et al. (2018) presented a Convolutional Neural Network (CNN) model with which they have successfully obtained forecasting patterns from line-of-sight magnetograms of solar active regions, but the Heidke skill score (HSS) performance of their results indicates that their model needs to be improved. The DeFN Model put forward by Nishizuka et al. (2018) calculates the probability of flares and carries out a binary classification, using inputs that are extracted manually. Park et al. (2018) designed a combined model inspired by the GoogLeNet (Szegedy et al., 2015) and DenseNet (Huang G et al., 2017) and compared its binary prediction of ≥ C-class flares to the outputs of two other pre-trained models; however, their results did not include predictions of ≥ M-class flares. Li XB et al. (2020) proposed a cross-validation method to split datasets, and obtained improved classification results based on their model that combines the VGGNet (Simonyan and Zisserman, 2015) and the AlexNet (Krizhevsky et al., 2012). However, in order to extract and classify flare patterns accurately from massive data, it is necessary to use more complex networks. Tang RX et al. (2021) combined three independent models to build a stronger solar flare prediction model; it performed better than the original models, but there is still room for improvement in data segregation. Sun PC et al. (2022) considered the spatial distribution and evolution of active regions and built a prediction model using 3D CNN. Vysakh and Mayank (2023) used a gradient-boosted decision-tree classifier to predict solar flares based on observed magnetic and flaring-history parameters. Grim and Gradvohl (2024) adopted improved multiscale vision transformers to forecast solar flares based on image sequences, but their work lacks prediction of ≥ C-class flares. Donahue and Inceoglu (2024) used a transformer network as the prediction model, suggesting the potential effectiveness of transformer networks in flare forecasting.
It should be noted that although deep learning methods have achieved good performance in solar flare prediction, some urgent problems remain to be solved, such as enhancing the generalization ability and stability of the models, and structural innovation for specific research objectives.
In this paper, we propose a fusion model designed to allow improved binary classification of both ≥C-class (C, M, and X classes) flares and ≥ M-class (M and X classes) flares. The data used in our model are line of sight (LOS) magnetograms of active regions (ARs) obtained from the Helioseismic and Magnetic Imager onboard the Solar Dynamics Observatory (SDO/HMI) (Scherrer et al., 2012). To improve the model's stability, we have applied to the data a method that combines active region segregation with cross-validation. The classification results of our model are compared with the observation labels from the Geostationary Operational Environment Satellite (GOES).
The remainder of this paper is organized as follows. Section 2 introduces the Deep Residual Network (ResNet) and Support Vector Machine (SVM). Section 3 shows the method of split data sets and presents the fusion model that integrates ResNet and SVM. Results of our model and further evaluation are given in Section 4. Finally, Section 5 presents conclusions and discussion.
A traditional convolutional neural network generally extracts information from a target image by superposition of the image’s convolutional layer and its pooling layer. In the early development stage of deep learning, a general consensus was that the learning ability and learning performance of the network would improve with deepening of the number of network layers, because a large number of convolution layers and convolution kernels can comprehensively extract features of different regions in the target image (e.g., Szegedy et al., 2015; Simonyan and Zisserman, 2015). However, increasing the depth of a network blindly does not always improve the prediction performance of a deep learning model, due to the problem of vanishing/exploding gradients, and the degradation problem. In order to solve these problems, He et al. (2016) proposed the Deep Residual Network.
Supposing that there is a shallow network, one can get a deep network according to the traditional idea of adding layers. If the newly added layers do not learn anything, but only copy features of the shallow network, the performance of the deep network should not be worse than that of the original shallow network, which means that the degradation problem will not appear in identity mapping. The structure that implements this function in ResNet is called identity block. For a few stacked layers, the learned features are denoted as H(x) when the input is x, and ResNet actually lets these layers approximate a residual function:
F(x)=H(x)−x. |
The reason for this is that the residual learning is easier than the original feature learning. When the residual is 0, it is equivalent to implementing an identity mapping, so the performance of the network will not be degraded. Since the residual will not be 0 in most cases, the stacking layer can learn new features based on the input features, and thus can result in better performances of the model.
The baselines of ResNet are inspired mainly by the VGGNet, and are inserted as identity blocks through shortcut connections. Figure 1 shows the network structure of ResNet50, the ResNet version used in our study. Stacking the two blocks named CVBK and IDBK avoids the degradation problem while deepening the depth of the network.
Support Vector Machine is a widely-used classification algorithm in the field of machine learning (Cortes and Vapnik, 1995). The basic idea of SVM is to find the hyperplane with the largest interval in the feature space, adopting the concept of a soft interval when the training data is linearly inseparable, so that the data can be efficiently binary-classified. The decision function can be represented by the following formulas:
\underset {w,b,\xi} {\mathrm{min}}\frac{1}{2}\|{\boldsymbol{w}}\|^2+C \sum_{i=1}^n \xi_i, |
where w is the normal vector of hyperplane, b is the intercept of hyperplane, ξ is the slack variable, the error of the decision function is calculated by hinge loss, and the penalty term C is used to indicate the decision function’s tolerance to the error.
When the samples are linearly inseparable in the original space, a kernel function is introduced to map the samples from the original space to a higher-dimensional feature space, in which the optimal separation hyperplane is constructed. There are many kernel functions in SVM. In this study, the Gaussian kernel function is utilized to process the features of the samples. The Gaussian kernel function is shown as follows:
\kappa\left(x_{i}, x_{j}\right)=\exp \left(-\frac{\| x_{i}-x_{j} \|^{2}}{2 \sigma^{2}}\right). |
Where σ is a hyperparameter that controls the local scope of the Gaussian kernel function.
The line-of-sight magnetograms of active regions (ARs) used in this study were obtained from Space-weather HMI Active Region Patches (SHARP; Bobra et al, 2014) on the Joint Science Operations Center (http://jsoc.stanford.edu/ajax/lookdata.html). The reason LOS magnetograms are used instead of vector magnetograms is that the magnetogram samples are directly used as inputs to forecast solar flares, without using manually selected feature parameters to assist prediction, although vector magnetograms contain more information to parameterize active regions (e.g., Bobra and Couvidat, 2015). We collected samples of magnetograms from May 1, 2010 to September 13, 2018, covering solar cycle 24. The temporal cadence of HMI data is 12 minutes. In addition, projection effects were also considered, in accordance with Bobra et al. (2014). Samples of LOS magnetograms included in our study were limited to those located within ±45° of the central meridian. The image sizes in the original dataset are various; all images were resized to 128 × 128 pixels to meet the fixed size requirement of the network. Input data were normalized during the training process to ensure that all were in the range of 0 to 1. The formula is as follows:
{X_{{\mathrm{nor}}}} = \frac{{X - {X_{{\mathrm{min}} }}}}{{{X_{{\mathrm{max}} }} - {X_{{\mathrm{min}} }}}} , |
where Xnor represents the normalized data, X represents the data samples that needed to be normalized, and Xmax and Xmin represent the maximum and minimum values, respectively.
The flares in the active regions have been categorized into four grades: No-flare (N), C, M, and X according to the peak magnitude of the soft X-ray flux observed by the Geostationary Operational Environment Satellite. Specifically, label "X" means that an X-class flare originated in the AR sample within 24 h after the observation time; similarly, label "M" means that an M-class flare, but no X-class flare, originated in the AR sample within 24 h after the observation time; label "C" means that a C-class flare, but neither an M-class nor an X-class flare, originated in the AR sample within 24 h after the observation time; label "N" applied to an AR sample indicates that within 24 h after the observation time, either no flares, or flare(s) weaker than a C1.0 flare, originated in that AR. It should be noted that if more than one flare occurred in an active region, the magnetogram samples were classified according to the occurrence time span of each flare. During the period from the beginning of each flare to its disappearance, the corresponding label is assigned to the magnetogram sample according to the above classification method. Following this procedure, we identified 1072 solar active regions and 211,305 magnetogram samples.
In solar flare prediction, it is difficult to guarantee the accuracy and stability of a model by a traditional dataset partition method due to the number and quality of samples. To address this problem, we adopted a cross validation (CV) method, based on active region (AR) segregation as proposed by Li XB et al. (2020). We shuffle the AR numbers of each level of flare, and then use 80% of them as the model-training AR set, reserving the other 20% for use in testing the model. All the magnetogram samples belonged to the training AR set are defined as the magnetogram training set, and all the magnetogram samples belonged to the testing AR set are used as the magnetogram test set. It is obvious why the standard for dividing these samples is the AR number: doing so avoids the possibility that magnetograms from any active region might appear both in the training set and in the test set. We obtained seven independent datasets according to the above method. The numbers of different types of magnetogram samples are shown in Figure 2.
It can be seen from Figure 2 that differences among the numbers of flare samples at different levels are large, so we have utilized data augmentation and downsampling to alleviate this class-imbalance problem. For X-level and M-level samples, we performed supervised data augmentation by applying geometric transformations to the images, such as flipping, rotation, etc. We note that we have utilized some sophisticated data augmentation techniques. For example, Amar and Ben-Shahar (2024) focus on oversampling X-level samples using the synthetic minority oversampling technique (SMOTE). This method requires a considerable amount of time and memory. A relatively large number of M-level samples also need to be oversampled in our study. In the end, the number of X samples increased to 12 times the original; the number of M samples increased to 6 times the original. For C-level and N-level samples, we did downsampling: we picked one sample randomly out of every 8 C-level samples, and one out of every 5 N-level samples. In addition, when testing the model, in order to prevent overlap between the ARs of the training and testing samples, we selected, as the test data, samples that belonged only to a particular AR. We consciously assigned more weights and attention to positive class samples with complex features, through data augmentation and downsampling during model training. This approach was intended to help alleviate the negative impact of the class imbalance problem and to improve the model's generalization ability.
Compared with traditional methods, using deep learning to predict solar flares avoids manual extraction of physical parameters from images, which simplifies the prediction process and should enhance the objectivity of the research (e.g., Huang X et al., 2018). However, there is still room for improvement in deep learning models. We hold the opinion that the widely used deep neural network architecture does not have sufficient generality in solar flare prediction; the application of deep learning should not be limited to parameter adjustment of existing networks. Rather, it is important to change network structures considering the particular characteristics of each research objective.
The softmax classifier used in plain ResNet does not explicitly optimize feature-embedding to improve the similarity of intra-class samples and the diversity of inter-class samples (Deng JK et al., 2019), and thus may prevent the model from achieving sufficient generalization ability. SVM is a structural risk minimization classification that is relatively stable and has less risk of overfitting in the case of a moderate sample size. SVM obtains the global optimum, which helps to solve the local extremum problem that may occur in deep learning (Cortes and Vapnik, 1995; Goodfellow et al., 2016). Based on the above reasons, we propose a novel solar flare forecasting model that integrates ResNet with SVM in order to make reliable binary classification possible for both ≥ C-class and ≥ M-class flares.
A SVMWrapper module is designed to implement model integration. The whole feature extraction process of ResNet is implemented, and the softmax classifier is replaced by the SVM classifier through the SVMWrapper module. In other words, the feature extracted by the ResNet will be fed to the SVMWrapper module, rather than to the softmax classifier. First, the flare data are input into the ResNet network, and we capture the feature image, also called intermediate-output, in the time after the feature image is extracted by ResNet and before it is fed to the softmax classifier. Second, we manually use intermediate-output as input to the SVM classifier, ultimately obtaining classification results that are different from what would be delivered by plain ResNet. Due to the replacement of the classifier, the performance of the model also varies. It can be seen in this study that the effect of replacing the classifier is positive. The fact that the classifier is embedded confirms the viability of the SVMWrapper module. In order to realize the combination of the two algorithms, it is crucial that the feature images extracted by ResNet be accurately captured, and that the vector size of the intermediate-output should match with the vector size of SVM.
In this study, Model(N_CMX) and Model(NC_MX) represent the fusion model’s binary classification of ≥C-class flares and ≥M-class flares, respectively. They have the same structure but different hyperparameters. In deep learning, loss functions can be used to evaluate the accuracy of a model. For ResNet, considering the active region and the samples at each level, we used weighted cross entropy loss as follows:
{\mathrm{loss}} = \sum\limits_{m = 1}^M {\sum\limits_{n = 1}^N {{w_n}{y_{mn}}{\mathrm{ln}}({{\hat y }_{mn}})} } , |
where M is the number of samples passed to the model for training at one time; N is the number of classes; wn is the weight of the nth class, which is determined by an optimized parameter obtained through experiment and the number of active regions and samples of each class; ymn represents the observed value, and
{\mathrm{ loss}} = {\mathrm{max}} (0,1 - {{\boldsymbol{y}}_i}({{\boldsymbol{w}}^{\mathrm{T}}}{{\boldsymbol{x}}_i} + {\boldsymbol{b}})) , |
where xi is the input vector, yi is the target, and
In order to minimize the loss function, the stochastic gradient descent (SGD) optimizer is adopted for an 80-epoch training with a batch size of 16 and learning rate of 0.0001, and momentum takes 0.9 to simulate second-order gradients and increase optimization speed. Besides, early stopping strategy and a dynamically adjusting learning rate module are adopted to prevent overfitting and to improve training efficiency. The training is stopped when there is no improvement in the validation loss for 20 consecutive epochs. The learning rate will be reduced to 0.5 times of the original when there is no improvement in the validation loss for 10 consecutive epochs. Our model is trained on NVIDIA Tesla V100, and the version of tensorflow is 1.10.0. We found that after applying the hyperparameter settings mentioned above, the model begins to converge within the first 10 to 20 epochs, during which the training and validation losses decrease significantly; in the subsequent training process, the decreasing trend of losses slows down. The training process on some datasets triggered early stopping before 80 epochs. We noted fluctuation phenomena in the validation losses during the training process, but the overall trend was a gradual decrease.
In this study,the ≥ C-class and ≥ M-class are defined as positive classes; the <C-class and <M-class are negative classes. The magnetogram samples that are correctly classified as positive are considered as true positive (TP); the magnetogram samples correctly classified as negative are considered as true negative (TN). In addition, false positive (FP) means that the samples are wrongly classified as positive; false negative (FN) represents these samples that are wrongly classified as negative. In order to evaluate the model’s prediction performance, several metrics are defined as follows.
{\mathrm{Recall}} = \frac{{{\mathrm{TP}}}}{{{\mathrm{TP}} + {\mathrm{FN}}}}, |
{\mathrm{Accuracy}} = \frac{{{\mathrm{TP}} + {\mathrm{TN}}}}{{{\mathrm{TP}} + {\mathrm{FP}} + {\mathrm{TN}} + {\mathrm{FN}}}}, |
{\mathrm{Precision}} = \frac{{{\mathrm{TP}}}}{{{\mathrm{TP }}+ {\mathrm{FP}}}}, |
{\mathrm{TSS}} = \frac{{{\mathrm{TP}}}}{{{\mathrm{TP}} + {\mathrm{FN}}}} - \frac{{{\mathrm{FP}}}}{{{\mathrm{FP}} + {\mathrm{TN}}}}, |
{\mathrm{HSS}} = \frac{{2\times [({\mathrm{TP}}\times {\mathrm{TN}}) - ({\mathrm{FN}}\times {\mathrm{FP}})]}}{{({\mathrm{TP}} +{\mathrm{ FN}})({\mathrm{FN}} + {\mathrm{TN}}) + ({\mathrm{TP}} + {\mathrm{FN}})({\mathrm{FP}} + {\mathrm{TN}})}}. |
Note that when dealing with the class-imbalance problem, according to Bloomfield et al. (2012), only the true skill statistics (TSS) are unbiased, so we will take TSS as the primary evaluation metric, though we employ other metrics as well when evaluating the performance of the model.
Error matrices summarizing test results of Model(N_CMX) and Model(NC_MX) are shown in Figure 3. All performance metrics presented above can be obtained from Figure 3. In each rectangle, there are more TP and TN on the main diagonal, indicating that the models classified most of the test samples correctly. It should be noted that the number of samples that are classified as TN in the lower right corner of each matrix of Model(N_CMX) is small. This is because the augmentation and downsampling strategies were applied simultaneously to both Model(N_CMX) and Model(NC_MX); in Model(N_CMX), the percentage of positive samples was higher than the percentage of negative samples after augmentation and downsampling. Although such a dataset construction seems uncommon, this pattern has also appeared in previous solar flare forecasting studies (e.g., Li XB et al., 2020).
To comprehensively evaluate the performance of the model, all five metrics are calculated and compared in Table 1, where possible, with those of several previous studies. The model of Bloomfield et al. (2012) is based on statistics; the other comparison studies summarized in Table 1 used deep learning in their forecasting method. Nishizuka et al. (2018) and Tang RX et al. (2021) both extracted solar flare features manually; the former then used a single deep learning network, the latter employed a fusion of multiple deep-learning models. Huang X et al. (2018) and Li XB et al. (2020) take the magnetograms directly as input and automatically extract features of the images through a convolutional neural network, which is similar to the approach in our study. In addition, Li XB et al. (2020) also used the method of cross-validation. Donahue and Inceoglu (2024) used a transformer network as their prediction model. It should be noted that model performance is highly dependent on the selected dataset and dataset construction. Due to factors such as the time range of a dataset, data augmentation methods, data preprocessing methods, and model selection, performance of our model and performances reported in previous works cannot be fully measured by metrics and directly compared; comparisons given here are just for reference.
Metric | Model | ≥ C-class | ≥ M-class |
TSS | This work | 0.708 ± 0.027 | 0.758 ± 0.042 |
Donahue and Inceoglu (2024) | 0.589 | 0.661 | |
Tang RX et al. (2021) | 0.639 | 0.720 | |
Li XB et al. (2020) | 0.679 ± 0.045 | 0.749 ± 0.079 | |
Huang X et al. (2018) | 0.487 | 0.662 | |
Nishizuka et al. (2018) | 0.634 | 0.804 | |
Bloomfield et al. (2012) | 0.456 | 0.539 | |
HSS | This work | 0.658 ± 0.043 | 0.744 ± 0.041 |
Donahue and Inceoglu (2024) | 0.537 | 0.158 | |
Tang RX et al. (2021) | — | — | |
Li XB et al. (2020) | 0.671 ± 0.040 | 0.759 ± 0.071 | |
Huang X et al. (2018) | 0.339 | 0.143 | |
Nishizuka et al. (2018) | 0.528 | 0.265 | |
Bloomfield et al. (2012) | 0.315 | 0.190 | |
Recall | This work | 0.892 ± 0.031 | 0.853 ± 0.031 |
Donahue and Inceoglu (2024) | 0.810 | 0.876 | |
Tang RX et al. (2021) | 0.817 | 0.878 | |
Li XB et al. (2020) | 0.889 ± 0.029 | 0.817 ± 0.084 | |
Huang X et al. (2018) | 0.726 | 0.850 | |
Nishizuka et al. (2018) | 0.809 | 0.947 | |
Bloomfield et al. (2012) | 0.753 | 0.704 | |
Accuracy | This work | 0.876 ± 0.022 | 0.875 ± 0.020 |
Donahue and Inceoglu (2024) | 0.788 | 0.788 | |
Tang RX et al. (2021) | — | — | |
Li XB et al. (2020) | 0.861 ± 0.022 | 0.891 ± 0.024 | |
Huang X et al. (2018) | 0.756 | 0.813 | |
Nishizuka et al. (2018) | 0.822 | 0.860 | |
Bloomfield et al. (2012) | 0.712 | 0.830 | |
Precision | This work | 0.946 ± 0.015 | 0.923 ± 0.036 |
Donahue and Inceoglu (2024) | 0.608 | 0.115 | |
Tang RX et al. (2021) | 0.464 | 0.131 | |
Li XB et al. (2020) | 0.906 ± 0.026 | 0.889 ± 0.056 | |
Huang X et al. (2018) | 0.352 | 0.101 | |
Nishizuka et al. (2018) | 0.529 | 0.182 | |
Bloomfield et al. (2012) | 0.351 | 0.146 |
Overall, the prediction performance of ≥ M-class in most studies is better than that of ≥ C-class, which is reasonable because the higher the flare level, the more intense the flare activity, and thus the easier it is for a model to capture features that are helpful for classification. For ≥ C-class, it can be seen that our model has better TSS performance than the models in Donahue and Inceoglu (2024), Tang RX et al. (2021), Nishizuka et al. (2018), Huang X et al. (2018) and Bloomfield et al. (2012), and is comparable to the TSS performance reported by Li XB et al. (2020). For ≥ M-class, the TSS of our model is comparable to that in Li XB et al. (2020) and Tang RX et al. (2021), but overall is inferior to that of Nishizuka et al. (2018); however, in the single ≥ M-class data set CV6, our model’s TSS score was 0.822, which is higher than that of Nishizuka et al. (2018). It should be noted that the method of cross-validation used in our model clearly shows the mean and standard deviation of these metrics, which helps to assess the stability and robustness of the model. It is exciting that, despite the class-imbalance problem, our model nevertheless performs well for ≥ C-class. In addition, we also conducted an adaptive test that is closer to the operational case by modifying the proportion of flares of different classes in the dataset; using this modified dataset to test our trained model, we found that it was still competitive when compared with other models, although its metrics were not as good as in Table 1. This suggests that our model can effectively handle operational cases in which there are more N-level samples but fewer samples at the X-level and M-level. Detailed test results are presented in Supplemental Materials.
One can conclude that flare patterns can be predicted more accurately by using deep learning methods and automatically extracting feature parameters from images through the network, which should make the prediction process simpler and reduce the subjective impact of extracting features manually. In addition, our fusion model achieves good results compared to previous studies. We suggest that integrating the SVM into the fusion model is a convex optimization, and that the obtained global optimal improves the performance of the model.
Although we have made improvements to plain ResNet and propose a novel flare forecasting model integrating ResNet and SVM, questions remain whether this structural change has a reliably positive impact on network performance, and whether other deep learning networks could benefit from similar approaches. To discuss these issues, we design four models with ResNet and VGGNet as examples, and conduct solar flare prediction experiments on the same dataset as above; results are shown in Table 2. The reason we use VGGNet is that it has demonstrated strong performance and has been successfully applied in solar flare classification studies (e.g., Li XB et al., 2020). Fusion-ResNet represents our proposed model integrating ResNet and SVM; Plain-ResNet represents the original ResNet50 model; Fusion-VGG represents the model integrating VGGNet and SVM; Plain-VGG represents the original VGGNet model. It can be seen clearly that, compared to Plain-VGG, Plain-ResNet performs better; this is not surprising because ResNet has a deeper network structure and the degradation problem of the network is solved. The fusion model integrated with SVM performs better than either of the original networks, whether ResNet or VGGNet. However, performance of Fusion-VGG seems inferior to that of Plain-ResNet, which indicates that the backbone network choice is still crucial.
Metric | Model | ≥ C-class | ≥ M-class |
TSS | Fusion-ResNet | 0.708 ± 0.027 | 0.758 ± 0.042 |
Plain-ResNet | 0.653 ± 0.025 | 0.688 ± 0.048 | |
Fusion-VGG | 0.628 ± 0.057 | 0.653 ± 0.070 | |
Plain-VGG | 0.608 ± 0.037 | 0.636 ± 0.052 | |
HSS | Fusion-ResNet | 0.658 ± 0.043 | 0.744 ± 0.041 |
Plain-ResNet | 0.630 ± 0.021 | 0.669 ± 0.045 | |
Fusion-VGG | 0.605 ± 0.027 | 0.639 ± 0.074 | |
Plain-VGG | 0.585 ± 0.042 | 0.631 ± 0.055 | |
Recall | Fusion-ResNet | 0.892 ± 0.031 | 0.853 ± 0.031 |
Plain-ResNet | 0.905 ± 0.027 | 0.783 ± 0.032 | |
Fusion-VGG | 0.901 ± 0.033 | 0.784 ± 0.084 | |
Plain-VGG | 0.886 ± 0.052 | 0.789 ± 0.100 | |
Accuracy | Fusion-ResNet | 0.876 ± 0.022 | 0.875 ± 0.020 |
Plain-ResNet | 0.871 ± 0.014 | 0.836 ± 0.022 | |
Fusion-VGG | 0.862 ± 0.018 | 0.822 ± 0.040 | |
Plain-VGG | 0.853 ± 0.034 | 0.820 ± 0.033 | |
Precision | Fusion-ResNet | 0.946 ± 0.015 | 0.923 ± 0.036 |
Plain-ResNet | 0.929 ± 0.018 | 0.917 ± 0.027 | |
Fusion-VGG | 0.923 ± 0.029 | 0.892 ± 0.051 | |
Plain-VGG | 0.922 ± 0.015 | 0.882 ± 0.034 |
The performance of the model cannot be comprehensively evaluated only by the error (confusion) matrix and five metrics, so we construct graphs of receiver operating characteristic (ROC) curves, as shown in Figure 4. ROC curves are not sensitive to the class-imbalance problem and can thus be useful in comparing the performance of several different models. The horizontal axis of the curve presents the false positive ratio (FPR), the proportion of FP in the total number of negative samples. The vertical axis of the curve presents the true positive ratio (TPR), which is the proportion of TP in the total number of positive samples. Each point on the ROC curve represents a classifier under a certain threshold; its abscissa and ordinate represent the performance of this classifier. The closer the ROC curve is to the upper left corner, the better the performance of the model. For a model that can implement perfect classification, FPR equals 0 and TPR equals 1. Area under curve (AUC) is defined as the area under the ROC curve bounded by the axes, which can be used to numerically evaluate the performance of different models. Note that our proposed Fusion-ResNet has a larger AUC than Plain-ResNet for both ≥ C-class and ≥ M-class classification, indicating improved classification performance. Results in Figure 4 and Table 2 both indicate that Fusion-ResNet, integrated with SVM, yields a greater performance improvement than Plain-ResNet for ≥ M-class compared to ≥ C-class. The number of positive and negative samples of ≥ M-class is more unbalanced than that of ≥ C-class; we conjecture that Fusion-ResNet shows a strong classification ability, compared to Plain-ResNet, when the models are confronted with datasets in which the sample sizes of various classes are not perfectly balanced.
In this paper, we propose a novel solar flare forecasting model that integrates ResNet and SVM to implement binary classification for both ≥ C-class and ≥ M-class flares. We collected samples from May 1, 2010 to September 13, 2018, including 1072 solar active regions and
The main results of this study are as follows. First, we have successfully built a fusion model based on intermediate-output extracted by ResNet and SVM with the Gaussian kernel function to project the features of samples, which is an architectural innovation compared to previous flare forecasting models. Second, the seven independent datasets generated by AR segregation were input to the fusion model after downsampling and data augmentation without manual feature extraction; this approach helped to mitigate the negative impact of class-imbalance problems and reduced the degree of human subjectivity in the prediction process. Third, our model’s TSS value was 0.708 ± 0.027 for ≥ C-class, which is higher than those achieved by most previous models and is comparable with that of Li XB et al. (2020). For ≥ M-class, the value is 0.758 ± 0.042, which is comparable with that of Li XB et al. (2020) and Tang RX et al. (2021). For the other four metrics, our fusion model also achieved good values, with relatively small standard deviations. These results indicate that the proposed fusion model integrated with SVM performs better than the original network; we note that to achieve good prediction results it is crucial to select an excellent baseline network (e.g., ResNet).
Deep learning technology, made possible by efficient computing power and powerful new learning abilities, promises to be an important tool for scientific research. Although we believe that deep learning network architecture should not be static, architectural innovation should not be a stumbling block that limits network performance; there may be greater potential waiting for us to discover. Introducing SVM into the main framework of ResNet, as we have done, appears to have taken advantage of both the powerful feature-extraction ability of ResNet and the efficient classification performance of SVM, yielding exciting results in this enlightening application of deep learning techniques to the study of space weather. This method may also have good portability and wide applicability, and may have broad application prospects.
This work was supported by the National Key R&D Program of China (Grant No.2022YFF0503700) and the National Natural Science Foundation of China (
The Adaptive Test
We have modified the proportion of flares of different classes in the dataset to form new test data that are closer to the operational case, on which to conduct an adaptive test. In the end, the number of X samples was increased to 12 times the original; the number of M samples was increased to 2 times the original. We downsampled the C-level samples by randomly picking one sample out of every 8 samples, and the N-level samples by randomly picking one out of every 2 samples. The new test data contain 204 X-level, 1212 M-level, 1965 C-level, and 3914 N-level samples. X-level samples account for 2.80% of the test data, M-level samples account for 16.61%, C-level samples account for 26.94%, and N-level samples account for 53.65%.
Table S1 presents results of testing the trained model, above, on this new dataset. The same five metrics used in Table 1 were calculated for this adaptive test, as shown in Table S1. The adaptive test represents the model’s performance on the new test data; this work means the tests presented in Figure 3. Although the performance metrics of the adaptive test are worse than those of this work (for which the proportion of X-level and M-level samples was small), the new test data used in the adaptive test are closer to the operational case. The performance difference should be due to the lower proportion of positive classes in the new test data, and there are also differences in sample distribution between the training data and the new test data, which subjects the performance of the model to higher requirements, especially for the ≥ M-class. However, note that our model retains competitiveness when compared to previous models, indicating that our trained model has strong generalization ability and robustness. Through data augmentation and downsampling during model training, we consciously assign more weights and attention to positive samples with complex features. This approach helps alleviate the negative impact of class imbalance and improves the model's generalization ability.
In summary, after this adaptive test, we believe that the model proposed in this work has good predictive performance and can achieve competitive flare prediction.
Metric | Model | ≥ C-class | ≥ M-class |
TSS | The adaptive test | 0.676 | 0.693 |
This work | 0.708 ± 0.027 | 0.758 ± 0.042 | |
HSS | The adaptive test | 0.680 | 0.670 |
This work | 0.658 ± 0.043 | 0.744 ± 0.041 | |
Recall | The adaptive test | 0.790 | 0.770 |
This work | 0.892 ± 0.031 | 0.853 ± 0.031 | |
Accuracy | The adaptive test | 0.842 | 0.893 |
This work | 0.876 ± 0.022 | 0.875 ± 0.020 | |
Precision | The adaptive test | 0.857 | 0.706 |
This work | 0.946 ± 0.015 | 0.923 ± 0.036 |
Amar, E., and Ben-Shahar, O. (2024). Image synthesis for solar flare prediction. Astrophys. J. Suppl. Ser., 271(1), 29. https://doi.org/10.3847/1538-4365/ad1dd4
|
Bloomfield, D. S., Higgins, P. A., McAteer, R. T. J., and Gallagher, P. T. (2012). Toward reliable benchmarking of solar flare forecasting methods. Astrophys. J. Lett., 747(2), L41. https://doi.org/10.1088/2041-8205/747/2/L41
|
Bobra, M. G., Sun, X., Hoeksema, J. T., Turmon, M., Liu, Y., Hayashi, K., Barnes, G., and Leka, K. D. (2014). The Helioseismic and Magnetic Imager (HMI) vector magnetic field pipeline: SHARPs–space-weather HMI active region patches. Sol. Phys., 289(9), 3549–3578. https://doi.org/10.1007/s11207-014-0529-3
|
Bobra, M. G., and Couvidat, S. (2015). Solar flare prediction using SDO/HMI vector magnetic field data with a machine-learning algorithm. Astrophys. J., 798(2), 135. https://doi.org/10.1088/0004-637X/798/2/135
|
Bornmann, P. L., and Shaw, D. (1994). Flare rates and the McIntosh active-region classifications. Sol. Phys., 150(1), 127–146. https://doi.org/10.1007/BF00712882
|
Bradshaw, G. F. R., and Ceci, L. (1988). A Connectionist Expert System that Actually Works. Proceedings of the 1st International Conference on Neural Information Processing Systems. Neural Information Processing Systems, 1, 248–255. https://papers.nips.cc/paper/135-a-connectionist-expert-system-that-actually-works.pdf
|
Campi, C., Benvenuto, F., Massone, A. M., Bloomfield, D. S., Georgoulis, M. K., and Piana, M. (2019). Feature ranking of active region source properties in solar flare forecasting and the uncompromised stochasticity of flare occurrence. Astrophys. J., 883(2), 150. https://doi.org/10.3847/1538-4357/ab3c26
|
Camporeale, E. (2019). The challenge of machine learning in space weather: Nowcasting and forecasting. Space Wea., 17(8), 1166–1207. https://doi.org/10.1029/2018SW002061
|
Cortes, C., and Vapnik, V. (1995). Support-vector networks. Mach. Learn., 20(3), 273–297. https://doi.org/10.1023/A:1022627411411
|
Deng, J. K., Guo, J., Xue, N. N., and Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4685–4694). Long Beach, CA, USA: IEEE. https://doi.org/10.1109/CVPR.2019.00482
|
Donahue, K. P., and Inceoglu, F. (2024). Forecasting solar flares with a transformer network. Frontiers in Astronomy and Space Sciences, 10, 1298609. https://doi.org/10.3389/fspas.2023.1298609
|
Gallagher, P. T., Moon, Y. J., and Wang, H. M. (2002). Active-region monitoring and flare forecasting–I. Data processing and first results. Sol. Phys., 209(1), 171–183. https://doi.org/10.1023/A:1020950221179
|
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. Cambridge, Massachusetts, MIT Press. http://www.deeplearningbook.org
|
Grim, L. F. L., and Gradvohl, A. L. S. (2024). Solar flare forecasting based on magnetogram sequences learning with multiscale vision transformers and data augmentation techniques. Sol. Phys., 299(3), 33. https://doi.org/10.1007/s11207-024-02276-0
|
Guastavino, S., Marchetti, F., Benvenuto, F., Campi, C., and Piana, M. (2022). Implementation paradigm for supervised flare forecasting studies: A deep learning application with video data. Astron. Astrophys., 662, A105. https://doi.org/10.1051/0004-6361/202243617
|
Guerra, J. A., Pulkkinen, A., and Uritsky, V. M. (2015). Ensemble forecasting of major solar flares: First results. Space Wea., 13(10), 626–642. https://doi.org/10.1002/2015SW001195
|
He, K. M., Zhang, X. Y., Ren, S. Q., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). Las Vegas, NV, USA: IEEE. https://doi.org/10.1109/CVPR.2016.90
|
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2261–2269). Honolulu, HI, USA: IEEE. https://doi.org/10.1109/CVPR.2017.243
|
Huang, X., Wang, H. N., Xu, L., Liu, J. F., Li, R., and Dai, X. H. (2018). Deep learning based solar flare forecasting model. I. Results for line-of-sight magnetograms. Astrophys. J., 856(1), 7. https://doi.org/10.3847/1538-4357/aaae00
|
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (pp. 1097–1105). Lake Tahoe, Nevada: Curran Associates Inc.
|
Li, X. B., Zheng, Y. F, Wang, X. S., and Wang, L. L. (2020). Predicting solar flares using a novel deep convolutional neural network. Astrophys. J., 891(1), 10. https://doi.org/10.3847/1538-4357/ab6d04
|
Li, Y. Y., Huang, S. Y., Xu, S. B., Yuan, Z. G., Jiang, K., Wei, Y. Y., Zhang, J., Xiong, Q. Y., Wang, Z., .. Yu, L. (2022). Selection of the main control parameters for the Dst index prediction model based on a layer-wise relevance propagation method. Astrophys. J. Suppl. Ser., 260(1), 6. https://doi.org/10.3847/1538-4365/ac616c
|
Liu, C., Deng, N., Wang, J. T. L., and Wang, H. M. (2017). Predicting solar flares using SDO/HMI vector magnetic data products and the random forest algorithm. Astrophys. J., 843(2), 104. https://doi.org/10.3847/1538-4357/aa789b
|
Mason, J. P., and Hoeksema, J. T. (2010). Testing automated solar flare forecasting with 13 years of Michelson Doppler Imager magnetograms. Astrophys. J., 723(1), 634–640. https://doi.org/10.1088/0004-637X/723/1/634
|
Nishizuka, N., Sugiura, K., Kubo, Y., Den, M., Watari, S., and Ishii, M. (2017). Solar flare prediction model with three machine-learning algorithms using ultraviolet brightening and vector magnetograms. Astrophys. J., 835(2), 156. https://doi.org/10.3847/1538-4357/835/2/156
|
Nishizuka, N., Sugiura, K., Kubo, Y., Den, M., and Ishii, M. (2018). Deep flare net (DeFN) model for solar flare prediction. Astrophys. J., 858(2), 113. https://doi.org/10.3847/1538-4357/aab9a7
|
Park, E., Moon, Y. J., Shin, S., Yi, K., Lim, D., Lee, H., and Shin, G. (2018). Application of the deep convolutional neural network to the forecast of solar flare occurrence using full-disk solar magnetograms. Astrophys. J., 869(2), 91. https://doi.org/10.3847/1538-4357/aaed40
|
Priest, E. R., and Forbes, T. G. (2002). The magnetic nature of solar flares. Astron. Astrophys. Rev., 10(4), 313–377. https://doi.org/10.1007/s001590100013
|
Sadykov, V. M., and Kosovichev, A. G. (2017). Relationships between characteristics of the line-of-sight magnetic field and solar flare forecasts. Astrophys. J., 849(2), 148. https://doi.org/10.3847/1538-4357/aa9119
|
Scherrer, P. H., Schou, J., Bush, R. I., Kosovichev, A. G., Bogart, R. S., Hoeksema, J. T., Liu, Y., Duvall, T. L. Jr., Zhao, J., .. Tomczyk, S. (2012). The Helioseismic and Magnetic Imager (HMI) investigation for the Solar Dynamics Observatory (SDO). Sol. Phys., 275, 207–227. https://doi.org/10.1007/s11207-011-9834-2
|
Simonyan, K., and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556. https://doi.org/10.48550/arXiv.1409.1556
|
Song, H., Tan, C. Y., Jing, J., Wang, H. M., Yurchyshyn, V., and Abramenko, V. (2009). Statistical assessment of photospheric magnetic features in imminent solar flare predictions. Sol. Phys., 254(1), 101–125. https://doi.org/10.1007/s11207-008-9288-3
|
Sun, P. C., Dai, W., Ding, W. Q., Feng, S., Cui, Y. M., Liang, B., Dong, Z. Y., and Yang, Y. F. (2022). Solar flare forecast using 3D convolutional neural networks. Astrophys. J., 941(1), 1. https://doi.org/10.3847/1538-4357/ac9e53
|
Szegedy, C., Liu, W., Jia, Y. Q., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–9). Boston, MA, USA: IEEE. https://doi.org/10.1109/CVPR.2015.7298594
|
Tang, R. X., Liao, W. T., Chen, Z., Zeng, X. W., Wang, J. S., Luo, B. X., Chen, Y. H., Cui, Y. M., Zhou, M., .. Wu, Z. P. (2021). Solar flare prediction based on the fusion of multiple deep-learning models. Astrophys. J. Suppl. Ser., 257(2), 50. https://doi.org/10.3847/1538-4365/ac249e
|
Vysakh, P. A., and Mayank, P. (2023). Solar flare prediction and feature selection using a light-gradient-boosting machine algorithm. Sol. Phys., 298(11), 137. https://doi.org/10.1007/s11207-023-02223-5
|
Wang, H. N., Cui, Y. M., and He, H. (2009). A logistic model for magnetic energy storage in solar active regions. Res. Astron. Astrophys., 9(6), 687–693. https://doi.org/10.1088/1674-4527/9/6/007
|
Wheatland, M. S. (2005). A statistical solar flare forecast method. Space Wea., 3(7), S07003. https://doi.org/10.1029/2004SW000131
|
Xu, S. B., Huang, S. Y., Yuan, Z. G., Deng, X. H., and Jiang, K. (2020). Prediction of the Dst index with bagging ensemble-learning algorithm. Astrophys. J. Suppl. Ser., 248(1), 14. https://doi.org/10.3847/1538-4365/ab880e
|
Yuan, Y., Shih, F. Y., Jing, J., and Wang, H. M. (2010). Automated flare forecasting using a statistical learning technique. Res. Astron. Astrophys., 10(8), 785–796. https://doi.org/10.1088/1674-4527/10/8/008
|
ZhiPeng Ren, YunBo Liu, WeiXing Wan, Yong Wei, XinAn Yue. 2025: MITM: A new Mars Ionosphere−Thermosphere Model. Earth and Planetary Physics, 9(2): 337-345. DOI: 10.26464/epp2024064 | |
ShaoBo Yang, HaiJiang Zhang, MaoMao Wang, Ji Gao, Shuaijun Wang, BaoJin Liu, XiWei Xu. 2025: Earthquake monitoring and high-resolution velocity tomography for the central Longmenshan fault zone by a temporary dense seismic array. Earth and Planetary Physics, 9(2): 239-252. DOI: 10.26464/epp2024006 | |
BingKun Yu, PengHao Tian, XiangHui Xue, Christopher J. Scott, HaiLun Ye, JianFei Wu, Wen Yi, TingDi Chen, XianKang Dou. 2025: Comparative analysis of empirical and deep learning models for ionospheric sporadic E layer prediction. Earth and Planetary Physics, 9(1): 10-19. DOI: 10.26464/epp2024048 | |
Xiaowen Yuan, Yuzhen Wang, Yang Li, Yuhao Liu, Weiling Xu, Lizi Wang, Ruihan Deng. 2025: Impacts of ENSO on Wintertime Total Column Ozone over the Tibetan Plateau Based on the Historical Simulations of Community Earth System Model. Earth and Planetary Physics, 9(0). DOI: 10.26464/epp2024080 | |
Zhixin Xue, Dongmei Guo, Jian Fang, Ronghua Cui. 2025: Constrained Gravity Inversion Unravels the Moho Depth and Tectonic Patterns in China and its Adjacent Areas. Earth and Planetary Physics, 9(0). DOI: 10.26464/epp2025043 | |
Yang Lin, JianYong Lu, BaoHang Qu, Xi Wang. 2024: Assessing the performance of magnetopause models based on THEMIS data. Earth and Planetary Physics, 8(5): 776-786. DOI: 10.26464/epp2024053 | |
Jaewoong Jung, Hyunju Connor, Andrew Dimmock, Steve Sembay, Andrew Read, Jan Soucek. 2024: Mshpy23: a user-friendly, parameterized model of magnetosheath conditions. Earth and Planetary Physics, 8(1): 89-104. DOI: 10.26464/epp2023065 | |
QingHua Zhou, YunXiang Chen, FuLiang Xiao, Sai Zhang, Si Liu, Chang Yang, YiHua He, ZhongLei Gao. 2022: A machine-learning-based electron density (MLED) model in the inner magnetosphere. Earth and Planetary Physics, 6(4): 350-358. DOI: 10.26464/epp2022036 | |
Yue Wu, Zheng Sheng, XinJie Zuo. 2022: Application of deep learning to estimate stratospheric gravity wave potential energy. Earth and Planetary Physics, 6(1): 70-82. DOI: 10.26464/epp2022002 | |
HuiJun Le, LiBo Liu, YiDing Chen, Hui Zhang. 2019: Anomaly distribution of ionospheric total electron content responses to some solar flares. Earth and Planetary Physics, 3(6): 481-488. DOI: 10.26464/epp2019053 |
Metric | Model | ≥ C-class | ≥ M-class |
TSS | This work | 0.708 ± 0.027 | 0.758 ± 0.042 |
Donahue and Inceoglu (2024) | 0.589 | 0.661 | |
Tang RX et al. (2021) | 0.639 | 0.720 | |
Li XB et al. (2020) | 0.679 ± 0.045 | 0.749 ± 0.079 | |
Huang X et al. (2018) | 0.487 | 0.662 | |
Nishizuka et al. (2018) | 0.634 | 0.804 | |
Bloomfield et al. (2012) | 0.456 | 0.539 | |
HSS | This work | 0.658 ± 0.043 | 0.744 ± 0.041 |
Donahue and Inceoglu (2024) | 0.537 | 0.158 | |
Tang RX et al. (2021) | — | — | |
Li XB et al. (2020) | 0.671 ± 0.040 | 0.759 ± 0.071 | |
Huang X et al. (2018) | 0.339 | 0.143 | |
Nishizuka et al. (2018) | 0.528 | 0.265 | |
Bloomfield et al. (2012) | 0.315 | 0.190 | |
Recall | This work | 0.892 ± 0.031 | 0.853 ± 0.031 |
Donahue and Inceoglu (2024) | 0.810 | 0.876 | |
Tang RX et al. (2021) | 0.817 | 0.878 | |
Li XB et al. (2020) | 0.889 ± 0.029 | 0.817 ± 0.084 | |
Huang X et al. (2018) | 0.726 | 0.850 | |
Nishizuka et al. (2018) | 0.809 | 0.947 | |
Bloomfield et al. (2012) | 0.753 | 0.704 | |
Accuracy | This work | 0.876 ± 0.022 | 0.875 ± 0.020 |
Donahue and Inceoglu (2024) | 0.788 | 0.788 | |
Tang RX et al. (2021) | — | — | |
Li XB et al. (2020) | 0.861 ± 0.022 | 0.891 ± 0.024 | |
Huang X et al. (2018) | 0.756 | 0.813 | |
Nishizuka et al. (2018) | 0.822 | 0.860 | |
Bloomfield et al. (2012) | 0.712 | 0.830 | |
Precision | This work | 0.946 ± 0.015 | 0.923 ± 0.036 |
Donahue and Inceoglu (2024) | 0.608 | 0.115 | |
Tang RX et al. (2021) | 0.464 | 0.131 | |
Li XB et al. (2020) | 0.906 ± 0.026 | 0.889 ± 0.056 | |
Huang X et al. (2018) | 0.352 | 0.101 | |
Nishizuka et al. (2018) | 0.529 | 0.182 | |
Bloomfield et al. (2012) | 0.351 | 0.146 |
Metric | Model | ≥ C-class | ≥ M-class |
TSS | Fusion-ResNet | 0.708 ± 0.027 | 0.758 ± 0.042 |
Plain-ResNet | 0.653 ± 0.025 | 0.688 ± 0.048 | |
Fusion-VGG | 0.628 ± 0.057 | 0.653 ± 0.070 | |
Plain-VGG | 0.608 ± 0.037 | 0.636 ± 0.052 | |
HSS | Fusion-ResNet | 0.658 ± 0.043 | 0.744 ± 0.041 |
Plain-ResNet | 0.630 ± 0.021 | 0.669 ± 0.045 | |
Fusion-VGG | 0.605 ± 0.027 | 0.639 ± 0.074 | |
Plain-VGG | 0.585 ± 0.042 | 0.631 ± 0.055 | |
Recall | Fusion-ResNet | 0.892 ± 0.031 | 0.853 ± 0.031 |
Plain-ResNet | 0.905 ± 0.027 | 0.783 ± 0.032 | |
Fusion-VGG | 0.901 ± 0.033 | 0.784 ± 0.084 | |
Plain-VGG | 0.886 ± 0.052 | 0.789 ± 0.100 | |
Accuracy | Fusion-ResNet | 0.876 ± 0.022 | 0.875 ± 0.020 |
Plain-ResNet | 0.871 ± 0.014 | 0.836 ± 0.022 | |
Fusion-VGG | 0.862 ± 0.018 | 0.822 ± 0.040 | |
Plain-VGG | 0.853 ± 0.034 | 0.820 ± 0.033 | |
Precision | Fusion-ResNet | 0.946 ± 0.015 | 0.923 ± 0.036 |
Plain-ResNet | 0.929 ± 0.018 | 0.917 ± 0.027 | |
Fusion-VGG | 0.923 ± 0.029 | 0.892 ± 0.051 | |
Plain-VGG | 0.922 ± 0.015 | 0.882 ± 0.034 |
Metric | Model | ≥ C-class | ≥ M-class |
TSS | The adaptive test | 0.676 | 0.693 |
This work | 0.708 ± 0.027 | 0.758 ± 0.042 | |
HSS | The adaptive test | 0.680 | 0.670 |
This work | 0.658 ± 0.043 | 0.744 ± 0.041 | |
Recall | The adaptive test | 0.790 | 0.770 |
This work | 0.892 ± 0.031 | 0.853 ± 0.031 | |
Accuracy | The adaptive test | 0.842 | 0.893 |
This work | 0.876 ± 0.022 | 0.875 ± 0.020 | |
Precision | The adaptive test | 0.857 | 0.706 |
This work | 0.946 ± 0.015 | 0.923 ± 0.036 |