The Reference Catalog

Supervised learning relies on a well-defined reference database that provides labels with uniform criteria across all samples \citep{zhu2005semi, haeusser2017learning}. In the context of flare detection, the principal identifying features include the start, peak, and end times of events, the morphology of the flux profile during flares, and the characterization of the background level to distinguish quiescent from flaring phases. In practice, however, the construction of such a database presents significant challenges. Solar SXR emissions vary with time, and no clear-cut definition of the background flux exists, which can unambiguously distinguish events from the quiescent level \citep{veronig2004solar, Sadykov_2019, adithya2021solar}. Furthermore, flare lightcurves display substantial differences in their temporal profiles \citep{Benz2008, gryciuk2017flare, reep2023understanding}. Several flare catalogs provide start, peak, and end times; however, significant discrepancies arise among them owing to the arbitrary criteria embedded in each FDA (see Appendix \ref{app3} for more details).

The use of an incomplete catalog as a training reference risks introducing systematic bias into the CNN identifications \citep{blanzeisky2021algorithmic}. To mitigate this risk, a new, internally consistent reference catalog is constructed. A total of 145 days are randomly selected from the interval 01 January 2018 to 31 May 2025, spanning solar minimum to maximum. Each day's data is visually inspected, and flaring events are annotated according to the authors’ judgment and definition of a flare. To fulfill the objectives of this study and address the obscuration effect, particular emphasis is placed on the identification of rise episodes rather than complete flare durations.

Restricting the analysis to the rise episode facilitates the detection of consecutive and overlapping events. Most existing automated FDAs fail to detect overlapping flare intervals to comply with the slow-driving assumption, that is, one flare must terminate before the next begins \citep{Aschwanden_2012, lu2024automatic, valluvan2024solar}. During periods of high solar activity, however, multiple flares frequently occur in close succession, making overlapping events common. By concentrating on the rise episodes, the non-overlap constraint is partially relaxed, that is, while overlap is not permitted within the rise episode of an ongoing flare, subsequent events initiating during the decay of a preceding flare are permitted.

This manual procedure yields a reference catalog of 7,700 solar flares, providing the foundation for supervised training and evaluation of the CNN framework. Figure \ref{fig_ref_cat} shows the probability distribution functions (PDFs) of the raw peak flux and waiting times of events in the reference catalog.