Skip to content

Data Pipeline

gordon-data is the sole writer of all market_data.* tables. No other service, no lab script, and no operator query inserts into these tables. This boundary is enforced at the database layer via the gordon_data_writer role — INSERT-only, granted to gordon-data exclusively. Runtime services read market data through named views; direct SELECT on underlying tables is revoked.

Sources

SourceDataFrequencyTables
Binance spot WSSpot klines (OHLCV)1m live + historical backfillmarket_data.spot_klines
Binance futures WSPerp klines (OHLCV)1m live + historical backfillmarket_data.perp_klines
Binance futures RESTFunding rates8hmarket_data.funding_rates
Binance futures RESTOpen interestHourly snapshotsmarket_data.open_interest
Binance futures RESTLong/short ratioPeriodicmarket_data.metrics
alt.meFear & Greed indexDailymarket_data.fear_greed
DefiLlamaStablecoin supply (SSR)Dailymarket_data.stablecoin_supply
Deribit options RESTGamma exposure (GEX)Periodicmarket_data.gamma_exposure, market_data.gamma_exposure_strikes
FREDMacro indicatorsDailymarket_data.macro_data

Pipeline

1m is canonical

All timeframes are derived from 1m candles. 1m is the canonical resolution:

  • Live ingest writes 1m rows as Binance pushes them.
  • Higher timeframes (5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 1w) are precomputed by gordon-data and stored in the same tables partitioned by timeframe.
  • A higher-TF candle is only emitted when at least 98% of the expected 1m bars are present. Sparse windows are never published.

Aggregation rules (applied identically for all timeframes):

FieldRule
openFirst 1m candle's open
highMaximum of all highs
lowMinimum of all lows
closeLast 1m candle's close
volumeSum of all volumes

Clock alignment: hourly candles start at :00, daily at UTC midnight, weekly at Monday 00:00 UTC.

Backfill

Historical data is loaded via the gordon-data CLI. The old make seed-* targets were deleted on 2026-04-19. The replacement is:

bash
gordon-data backfill <source> trigger

Examples:

bash
gordon-data backfill binance-spot trigger
gordon-data backfill binance-perp trigger
gordon-data backfill funding-rates trigger
gordon-data backfill open-interest trigger
gordon-data backfill fear-greed trigger
gordon-data backfill gex trigger
gordon-data backfill macro trigger

Backfill is idempotent (ON CONFLICT DO NOTHING on all market data inserts). Re-running a backfill for a range that is already present is a no-op.

Named views

Runtime services never SELECT directly on market_data.* underlying tables. Migration 0019 (latency exception) revoked direct grants and replaced them with stable named views:

ViewCovers
v_klines_readerSpot and perp klines across all timeframes
v_metrics_readerDerivatives sentiment (VPIN, long/short ratio)
v_macro_readerFRED macro indicators
v_funding_rates_reader8h funding rates
v_open_interest_readerOI snapshots

These views are owned by gordon-migrate and are the stable read contract. Adding a column to an underlying table does not break consumers as long as the view projection is updated in the same migration.

Live ingest path

During live operation gordon-data maintains persistent WebSocket connections. On each 1m candle close, the ingest module:

  1. Inserts the candle row into market_data.spot_klines (or perp_klines).
  2. Publishes a KlineEvent to market.klines.binance.{spot|perp}.{symbol}.1m via NatsPublisher::publish_within(&mut tx, ...) — same transaction as the insert (outbox pattern).

The outbox publisher loop (leader-elected in gordon-data) drains the bus.outbox row and delivers to NATS JetStream. See Event Flow for the full outbox mechanics.

Invariants

  • gordon-data is the sole writer of market_data.*. No other service, role, or script has INSERT/UPDATE/DELETE on these tables.
  • 1m is canonical. Higher TFs are always derived, never independently sourced.
  • All reads go through named views. Direct table SELECT grants are revoked for runtime roles.
  • Backfill is idempotent. Re-runs are safe.
  • The gordon_lab_reader role has SELECT on views and underlying tables but INSERT/UPDATE/DELETE are revoked DB-side.
  • market_data.* is not backed up (re-fetchable). The hourly restic backup covers only trading.*.
  • Event Flow — how kline events flow from gordon-data to gordon-bot over NATS.
  • Architecture — database role model.
  • Strategies — how strategies consume klines via gordon-bot.

Gordon — keep compounding without blowing up