Data backfill
When to use: bootstrapping a new environment with historical market data, triggering a gap-fill after an ingest outage, or auditing coverage.
gordon-data is the sole writer of market_data.*. All historical data flows through its backfill CLI — not through gordon-lab or direct SQL inserts. gordon-lab is read-only at the DB layer.
Prerequisites
- gordon-data is running and reachable.
GORDON_DATABASE_URLis set.- For dev stack:
make dev-upis running. - For production: the v7 stack on srv-apps is healthy (
GET /healthzreturns 200 on port 8081).
One-shot bootstrap
Run make dev-seed for the first-time bootstrap on a fresh dev environment. This triggers the full historical fetch for BTC+ETH:
make dev-seedFirst run: approximately 20 minutes for 1y of klines + funding + OI + metrics + Fear and Greed + stablecoin supply + GEX snapshot + FRED macro.
Re-runs are idempotent — only the gap since the last fetch is retrieved (ON CONFLICT DO NOTHING).
Skip stages:
SKIP_GEX=1 SKIP_MACRO=1 make dev-seedOverride the window:
FROM_DATE=2025-01-01 make dev-seedgordon-data backfill CLI
For targeted backfills, gap-fills, or production use, call the gordon-data backfill CLI directly via the running container or process.
Trigger a backfill
gordon-data backfill <source> triggerSources:
| Source | What it fetches |
|---|---|
klines | 1m spot and perp klines for all configured symbols |
funding | 8h funding rates |
open_interest | Hourly OI snapshots |
metrics | Long/short ratios |
sentiment | Fear and Greed index + stablecoin supply ratio |
macro | FRED macro indicators |
gex | Gamma exposure snapshots |
Example — trigger a funding backfill:
gordon-data backfill funding triggerCoverage audit
gordon-data backfill reportPrints a coverage table per source: symbol, timeframe, earliest row, latest row, gap count. Use this after an ingest outage to identify what needs to be re-fetched.
Expected output (abridged):
Source Symbol Earliest Latest Gaps
klines_spot BTCUSDT 2024-01-01 00:00:00 2026-05-17 12:00:00 0
klines_spot ETHUSDT 2024-01-01 00:00:00 2026-05-17 12:00:00 2
funding BTCUSDT 2024-01-01 00:00:00 2026-05-17 08:00:00 0Gap count > 0 means there are missing 1m bars in that symbol's window. Trigger a klines backfill to fill them; gordon-data fills gaps inline during ingest.
Job control
Active backfill jobs can be monitored and cancelled via the gordon-data REST API.
List active jobs
curl -fsS http://localhost:8081/backfill/jobs | jq .Expected output:
[
{
"id": "01JXXXXXXXXXXXXXXXXXXXXXXXXX",
"source": "klines",
"status": "running",
"rows_fetched": 14400,
"started_at": "2026-05-17T12:00:00Z"
}
]Cancel a job
curl -X DELETE http://localhost:8081/backfill/jobs/<id>Expected: HTTP 200 with {"status":"cancelled"}. The job stops at the next checkpoint; rows already written are not rolled back (writes are idempotent).
Verify
After a backfill completes:
gordon-data backfill reportAll gap counts should be 0 for the targeted source/symbol combination.
For klines specifically:
docker compose exec postgres psql -U gordon -d gordon -c "
SET search_path = market_data;
SELECT symbol, COUNT(*) AS rows, MIN(open_time) AS earliest, MAX(open_time) AS latest
FROM spot_klines
GROUP BY symbol
ORDER BY symbol;
"Related
- Dev stack —
make dev-seeddetails - Make targets —
data-infoanddev-seedtargets - Schema restore — recovering trading.* from restic snapshots