Identifying Flares with a CNN:

The reference catalog provides the foundation for constructing the training and validation datasets. To prevent leakage, that is, the use of test observations in training, the catalog is partitioned by day in chronological order, with the first 80% of the 145 days allocated to training and the remaining reserved for validation. Within each day, the Geostationary Operational Environmental Satellites (GOES) soft X-ray (SXR) flux signal is segmented into overlapping windows of fixed length (600 samples) with a stride of 120 samples, corresponding to 20% of the window size. Each window is normalized using z-score standardization (i.e., the mean is subtracted and the result is divided by the standard deviation, thereby producing inputs with zero mean and unit variance). The corresponding label sequences are derived directly from the reference catalog: rise episode were labeled as class (1), whereas all other intervals were assigned to class (0). This procedure yielded 65,955 training and 15,727 validation sets, spanning a wide range of flare morphologies and activity levels.

The network architecture integrates convolutional, recurrent, and attention-based modules to capture both local and global temporal patterns in the GOES SXR flux signal. The architecture is implemented in PyTorch and comprises three principal components: convolutional feature extraction, sequential modeling with a bidirectional long short-term memory (BiLSTM), and contextual encoding with a Transformer. The technical details of each stage are summarized below.

The input windows are processed in parallel through four convolutional branches with kernel sizes of 3, 5, 7, and 11. Each branch contained four convolutional layers with channel depths of 16, 32, 64, and 128, with rectified linear unit (ReLU) activations applied after each layer. The outputs of the four branches are concatenated along the channel dimension to form the convolutional representation of the signal. No pooling or downsampling operations are applied in order to preserve the temporal resolution of the data. To mitigate overfitting in this setting, a single dropout layer (rate 0.2) is introduced after the convolutional blocks.

The concatenated features are then passed to a BiLSTM layer with 256 hidden units in each direction, producing a 512-dimensional sequence representation. To complement this recurrent encoding, a Transformer encoder is subsequently applied. The encoder consisted of a single layer with a 512-dimensional embedding space, eight attention heads, and a feed-forward dimension of 1024. Standard sinusoidal positional encoding is used to provide temporal context prior to attention.

Finally, the sequence representation is mapped to class probabilities through a fully connected linear layer, yielding binary predictions for flare versus background. Training employs a weighted cross-entropy loss to address class imbalance. Optimization is performed using the Adam algorithm with a learning rate of 10⁻⁵ and a mini-batch size of 16.