Page 483 - ISC PROCEEDINGS 21.4
P. 483

Operational suitability for infrastructure: can the generator be measured, versioned,
                  served, and audited inside production pipelines used by firms and supervisors.
                        The subsections below summarize most common datasets and task-oriented metric
                  suites that together form a repeatable benchmarking recipe for infrastructure-grade
                  Financial GANs.
                        5.1. Datasets
                        Choose datasets that match the target task (market simulation, fraud detection,
                  limit-order realism, scenario generation). For infrastructure work, prefer datasets that are
                  accompanied by clear provenance and schema metadata so results are reproducible and
                  auditable.
                        Table 2 presents the representative financial datasets used in the reviewed studies.
                  The choice of dataset is important because it influences realism, benchmark
                  comparability, and the extent to which synthetic data can support downstream financial
                  tasks.
                    Table 2. Representative financial datasets for benchmarking infrastructure readiness
                                                       Application in Synthetic
                  Dataset Name Type / Description                                        Reference
                                                       Generation
                                  Proprietary          Benchmarking models for
                  American        transaction-level    transaction-level pattern         Efimov et al.
                  Express Sets                                                           (2020)
                                  records.             replication.
                                  Historical index-    Contextual/conditional            Xia et al.
                  DJIA / S&P
                  500 Series      level price time-    generation and statistical        (2024) ; Mehri
                                  series.              fidelity testing.                 et al. (2024)
                                  Multi-attribute
                  Contextual      market dynamics      Evaluating alignment between      Xia et al.
                  Market          and ticker           conditional inputs and            (2024)
                  Dataset                              generated output.
                                  identities.
                                  High-frequency       Predicting stock price
                  FI2010          Limit Order Book     movements and high-fidelity       Labiad et al.
                                  (LOB) data.          microstructure simulation.        (2024)
                        Source: Authors' compilation based on the studies reviewed in this paper.
                        5.2. Evaluation metrics
                        A robust evaluation combines complementary checks: statistical similarity, time-
                  series diagnostics, downstream utility, and risk-oriented measures. Below is a compact,
                  task-oriented table you can adopt.
                        Table 3 summarizes the evaluation metrics most frequently used in the literature.
                  The table shows that a complete assessment of synthetic financial data should combine
                  distributional fidelity, temporal realism, downstream utility, and adversarial quality.
                                      Table 3. Evaluation metrics for synthetic financial data
                  Metric or        Description                                        References
                  metric family
                  Distributional   Matches marginal distributions (mean, variance,    (Xia, H. et al.,
                  moments and      skew, kurtosis) and full-distribution properties   2024)
                  characteristic   between real and generated series
                  functions
                  Temporal         Preserves autocorrelation, cross-correlation,
                  dependence       sequential structure and path properties           (Allen, D. E. et


                                                                                                      482
   478   479   480   481   482   483   484   485   486   487   488