Data Pipeline
gordon-data is the sole writer of all market_data.* tables. No other service, no lab script, and no operator query inserts into these tables. This boundary is enforced at the database layer via the gordon_data_writer role — INSERT-only, granted to gordon-data exclusively. Runtime services read market data through named views; direct SELECT on underlying tables is revoked.
Sources
| Source | Data | Frequency | Tables |
|---|---|---|---|
| Binance spot WS | Spot klines (OHLCV) | 1m live + historical backfill | market_data.spot_klines |
| Binance futures WS | Perp klines (OHLCV) | 1m live + historical backfill | market_data.perp_klines |
| Binance futures REST | Funding rates | 8h | market_data.funding_rates |
| Binance futures REST | Open interest | Hourly snapshots | market_data.open_interest |
| Binance futures REST | Long/short ratio | Periodic | market_data.metrics |
| alt.me | Fear & Greed index | Daily | market_data.fear_greed |
| DefiLlama | Stablecoin supply (SSR) | Daily | market_data.stablecoin_supply |
| Deribit options REST | Gamma exposure (GEX) | Periodic | market_data.gamma_exposure, market_data.gamma_exposure_strikes |
| FRED | Macro indicators | Daily | market_data.macro_data |
Pipeline
1m is canonical
All timeframes are derived from 1m candles. 1m is the canonical resolution:
- Live ingest writes 1m rows as Binance pushes them.
- Higher timeframes (5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 1w) are precomputed by gordon-data and stored in the same tables partitioned by timeframe.
- A higher-TF candle is only emitted when at least 98% of the expected 1m bars are present. Sparse windows are never published.
Aggregation rules (applied identically for all timeframes):
| Field | Rule |
|---|---|
open | First 1m candle's open |
high | Maximum of all highs |
low | Minimum of all lows |
close | Last 1m candle's close |
volume | Sum of all volumes |
Clock alignment: hourly candles start at :00, daily at UTC midnight, weekly at Monday 00:00 UTC.
Backfill
Historical data is loaded via the gordon-data CLI. The old make seed-* targets were deleted on 2026-04-19. The replacement is:
gordon-data backfill <source> triggerExamples:
gordon-data backfill binance-spot trigger
gordon-data backfill binance-perp trigger
gordon-data backfill funding-rates trigger
gordon-data backfill open-interest trigger
gordon-data backfill fear-greed trigger
gordon-data backfill gex trigger
gordon-data backfill macro triggerBackfill is idempotent (ON CONFLICT DO NOTHING on all market data inserts). Re-running a backfill for a range that is already present is a no-op.
Named views
Runtime services never SELECT directly on market_data.* underlying tables. Migration 0019 (latency exception) revoked direct grants and replaced them with stable named views:
| View | Covers |
|---|---|
v_klines_reader | Spot and perp klines across all timeframes |
v_metrics_reader | Derivatives sentiment (VPIN, long/short ratio) |
v_macro_reader | FRED macro indicators |
v_funding_rates_reader | 8h funding rates |
v_open_interest_reader | OI snapshots |
These views are owned by gordon-migrate and are the stable read contract. Adding a column to an underlying table does not break consumers as long as the view projection is updated in the same migration.
Live ingest path
During live operation gordon-data maintains persistent WebSocket connections. On each 1m candle close, the ingest module:
- Inserts the candle row into
market_data.spot_klines(orperp_klines). - Publishes a
KlineEventtomarket.klines.binance.{spot|perp}.{symbol}.1mviaNatsPublisher::publish_within(&mut tx, ...)— same transaction as the insert (outbox pattern).
The outbox publisher loop (leader-elected in gordon-data) drains the bus.outbox row and delivers to NATS JetStream. See Event Flow for the full outbox mechanics.
Invariants
- gordon-data is the sole writer of
market_data.*. No other service, role, or script has INSERT/UPDATE/DELETE on these tables. - 1m is canonical. Higher TFs are always derived, never independently sourced.
- All reads go through named views. Direct table SELECT grants are revoked for runtime roles.
- Backfill is idempotent. Re-runs are safe.
- The
gordon_lab_readerrole has SELECT on views and underlying tables but INSERT/UPDATE/DELETE are revoked DB-side. market_data.*is not backed up (re-fetchable). The hourly restic backup covers onlytrading.*.
Related
- Event Flow — how kline events flow from gordon-data to gordon-bot over NATS.
- Architecture — database role model.
- Strategies — how strategies consume klines via gordon-bot.