SQLQueryHelperjs - v1.2.6
    Preparing search index...

    Benchmark History Guide

    This guide describes the versioned benchmark history layout now emitted by the benchmark runner and the operating model around it.

    The current benchmark output is useful for a point-in-time read, but it does not answer regression questions by itself.

    Teams eventually need to know:

    • whether a scenario got slower between releases
    • which engine regressed
    • whether a slowdown is within noise or operationally relevant
    • which release introduced the change

    The current runner keeps latest-results.json and latest-results.md for the newest snapshot and also persists versioned snapshots.

    Current layout:

    • benchmarks/history/<version>/results.json
    • benchmarks/history/<version>/results.md
    • benchmarks/history/<version>/comparison.json when a prior baseline exists
    • benchmarks/history/<version>/comparison.md when a prior baseline exists

    The release version is the simplest stable partition key.

    The default comparison target should be the previous released version of the package.

    When a previous release is not available, compare against:

    • the previous benchmark snapshot on the same branch, or
    • the first stable baseline agreed by the team

    Baseline fields in the comparison artifact:

    • baselineVersion
    • currentVersion
    • baselineGeneratedAt
    • currentGeneratedAt

    A useful comparison artifact should show:

    • scenario name
    • engine name
    • previous avg ms/op
    • current avg ms/op
    • absolute delta
    • percentage delta
    • classification such as improved, neutral, or regressed

    This makes it possible to review performance movement as part of a release.

    Current machine-readable snapshot shape:

    {
    "version": "1.1.0",
    "generatedAt": "2026-04-22T16:59:34.621Z",
    "host": {
    "platform": "win32",
    "release": "10.0.26100",
    "arch": "x64",
    "node": "v22.12.0",
    "cpu": "AMD Ryzen 5 5600GT with Radeon Graphics"
    },
    "measurements": [
    {
    "engine": "mysql",
    "scenario": "schema.reflect.noop.singleEntity",
    "avgMs": 29.173,
    "opsPerSec": 34.278,
    "iterations": 10,
    "totalMs": 291.73
    }
    ]
    }

    Current machine-readable comparison shape when a baseline exists:

    {
    "baselineVersion": "1.0.0",
    "currentVersion": "1.1.0",
    "baselineGeneratedAt": "2026-04-10T12:00:00.000Z",
    "currentGeneratedAt": "2026-04-22T16:59:34.621Z",
    "comparisons": [
    {
    "engine": "mysql",
    "scenario": "runtime.dml.roundtrip",
    "baselineAvgMs": 19.800,
    "currentAvgMs": 22.281,
    "deltaMs": 2.481,
    "deltaPct": 12.53,
    "classification": "regressed"
    },
    {
    "engine": "postgres",
    "scenario": "runtime.dml.roundtrip",
    "baselineAvgMs": 8.120,
    "currentAvgMs": 7.743,
    "deltaMs": -0.377,
    "deltaPct": -4.64,
    "classification": "improved"
    }
    ]
    }

    The point is not perfect precision in the first version. The point is a stable shape that later tooling can consume.

    The following interface set matches the current first implementation.

    type BenchmarkEngine = "sqlite" | "postgres" | "mysql";

    type BenchmarkComparisonClassification =
    | "improved"
    | "neutral"
    | "regressed"
    | "missing-baseline"
    | "new-scenario";

    interface VersionedBenchmarkMeasurement {
    engine: BenchmarkEngine;
    scenario: string;
    avgMs: number;
    opsPerSec: number;
    iterations: number;
    totalMs: number;
    }

    interface VersionedBenchmarkSnapshot {
    version: string;
    generatedAt: string;
    host: {
    platform: string;
    release: string;
    arch: string;
    node: string;
    cpu: string;
    };
    measurements: VersionedBenchmarkMeasurement[];
    }

    interface BenchmarkComparisonEntry {
    engine: BenchmarkEngine;
    scenario: string;
    baselineAvgMs?: number;
    currentAvgMs: number;
    deltaMs?: number;
    deltaPct?: number;
    classification: BenchmarkComparisonClassification;
    }

    interface BenchmarkComparisonArtifact {
    baselineVersion?: string;
    currentVersion: string;
    baselineGeneratedAt?: string;
    currentGeneratedAt: string;
    comparisons: BenchmarkComparisonEntry[];
    }

    This keeps the first version small enough to generate from the current benchmark runner without redesigning the whole benchmark report.

    Thresholds should be simple and explicit.

    Suggested starting point:

    • under 5%: treat as noise unless repeated
    • 5% to 15%: review and explain if the path is hot
    • above 15%: flag as a release-note item or investigate before release

    These thresholds are policy, not science. Teams should tune them by hardware stability and benchmark volatility.

    Suggested machine-readable classifications:

    • improved
    • neutral
    • regressed
    • missing-baseline
    • new-scenario

    Recommended workflow:

    1. run benchmarks for the release candidate
    2. store the latest snapshot
    3. compare against the previous released version
    4. publish the comparison summary with release notes when changes matter
    5. archive the versioned snapshot as release evidence

    That gives benchmark data operational meaning instead of leaving it as a one-off report.

    The repository now has a dedicated GitHub Actions benchmark workflow for release-time benchmark publication.

    Current workflow shape:

    • CI keeps merge validation separate from benchmark publication
    • Release Benchmarks runs the benchmark suite with PostgreSQL and MySQL service containers
    • benchmark artifacts are uploaded as workflow artifacts on every run
    • release runs also attach the benchmark files to the GitHub release and publish the regression summary into the release notes

    When run manually, the workflow can accept an optional version override so the next benchmark baseline can be generated deliberately for an upcoming release.

    Once automation is added further, CI should:

    • generate benchmark artifacts
    • compare current versus baseline
    • publish the comparison as an artifact or summary
    • optionally fail or warn when regression thresholds are exceeded

    The benchmark system does not need to block every slowdown, but it should at least surface them automatically.

    The current implementation already covers the first useful release workflow:

    • generate the current snapshot
    • load the previous version snapshot if present
    • produce comparison.json and comparison.md
    • upload those files as workflow artifacts
    • publish them with the release when the run is triggered from a release event

    The next CI implementation can stay simple:

    • optionally gate release promotion on benchmark workflow success
    • add threshold-aware warnings or failures when regression policy matures

    Benchmark regressions should be interpreted with context.

    • builder-only regressions are different from live DML regressions
    • engine-emulated paths such as MySQL returning(...) deserve separate attention
    • environment and Docker differences can distort raw numbers

    That is why every comparison should keep hardware and runtime context visible.

    The minimum useful first version of benchmark history is now in place:

    • versioned snapshot folders
    • one machine-readable comparison artifact
    • one human-readable markdown comparison
    • release notes that mention meaningful regressions or improvements

    That is enough to start trend tracking without building a dedicated dashboard.

    Once the versioned history exists, useful next steps would be:

    • percentiles instead of avg-only reporting
    • environment pinning metadata for more trustworthy comparisons
    • scenario grouping by builder, runtime, and schema workload
    • release dashboards fed from archived comparison JSON

    The next likely extensions are percentiles, richer environment metadata, and release dashboards fed from archived comparison JSON.