Bot warmup fallback
When to use: diagnosing a gordon-bot that is stuck warming up, or configuring degraded mode for a managed outage.
gordon-bot fetches its indicator windows from gordon-data via a single POST /warmup call at boot. Dual-source warmup (NATS retention + REST historical) was introduced in Wave 3 Phase 1 (2026-05-09). This page describes what the bot does when the warmup call returns anything other than a fully-complete payload, or when gordon-data is unreachable.
Why this matters
A bot that silently boots on partial data will trade on a stale or truncated indicator window. A 200-bar SMA with 150 bars is a different indicator; a funding-z computed on 3 of 8 observations is wrong, not "approximate". Live capital is at risk — "mostly there" is not an acceptable boot state.
Strict mode (default)
gordon-bot does not enter live trading unless all three conditions hold after the first /warmup response:
| Condition | Check |
|---|---|
| HTTP status | 200 OK |
| Every dataset complete | response.datasets[].is_complete == true for every entry |
| Every dataset fresh | response.datasets[].freshness_ts inside strategy staleness budget |
Any other outcome — HTTP 503, any is_complete=false, any stale freshness_ts — causes the bot to crash-loop with exponential backoff:
- Initial retry: 1s
- Doubling cap: 60s
- Jitter: 20% of the current interval
The bot process does not skip the check and does not proceed with what it has. It exits non-zero so the container restarts. gordon-manager observes the restart count and surfaces a "bot stuck warming up" alert when the retry count crosses 5 within a 5-minute window.
Strict mode is the production default. Do not disable it on a bot running real capital.
Dual-source warmup (Wave 3 Phase 1, 2026-05-09)
Since Wave 3 Phase 1, the bot's warmup phase uses two sources in order:
- NATS JetStream retention — replays recent candles from the
gordon-busstream's retention window (168h). Fast path; no REST call needed for recently-active symbols. - REST historical —
POST /warmupon gordon-data. Used for symbols not yet in NATS retention, or when NATS retention is insufficient to satisfy the strategy's lookback window.
If NATS retention covers the full lookback, the REST call is skipped entirely. If NATS retention is partial, the REST call fills the gap. The bot always validates the merged result against the same strict-mode criteria before transitioning to Live.
Freshness checks
Each datasets[].freshness_ts in the warmup response is the latest data-clock timestamp in the returned window — not the wall-clock moment gordon-data last wrote a row. Wall-clock liveness is a separate signal served by /sources/health. The two answer different questions and must not be conflated.
Staleness budgets per dataset kind:
| Dataset kind | Staleness budget | Rationale |
|---|---|---|
spot_klines (D1) | 26h | One bar + 2h slack for clock skew |
spot_klines (H1) | 70 min | One bar + 10 min slack |
perp_klines | Same as spot equivalent | |
funding_rates | 9h | One 8h window + 1h slack |
open_interest | 70 min | Hourly cadence |
long_short_ratio | 70 min | Hourly cadence |
fear_greed | 26h | Daily cadence |
macro | 26h | FRED publishes daily on business days |
liquidations | N/A (event stream) | Quiet markets are legitimate |
A bot finding freshness_ts older than its budget must treat the dataset as unavailable — same outcome as is_complete=false.
Degraded mode (opt-in)
For managed outages (data-provider incident, planned maintenance, testnet-only validation), the bot can be booted with:
GORDON_BOT_WARMUP_MODE=degradedBehavior:
- Bot proceeds after
/warmupresponse even whenis_complete=falseor some datasets are missing. - Bot logs a structured warning with
event=warmup_degradedand the full list of incomplete datasets. - Bot refuses to open new positions for the duration of the session. Existing positions are maintained and managed (stops, take-profits, reconciliation), but no new entries are emitted.
- On the next restart, strict mode returns unless the env var is still present. Degraded mode does not persist across restarts without explicit operator action.
Degraded mode is a last resort. Log a written justification with the incident and clear the env var as soon as upstream data recovers.
Restart semantics
- Warmup runs once, on bot boot. Success transitions the bot from
WarmingtoLive. - If gordon-data becomes unreachable mid-session, the bot does not re-warmup. It continues running on the already-fetched window. Live candle data arrives independently via NATS subscription and rolls the indicator state forward.
- A process-level restart (container crash, OOM, deploy) re-runs warmup from scratch.
- Staged deploys (green/blue) must warm the green instance before cutover. A cold green bot that fails warmup must not take traffic.
Console signal
On the first successful warmup in strict mode, the bot publishes:
topic: bot:<bot_id>:warmup
payload: {
"status": "ready",
"trace_id": "<uuid>",
"served_at_ts": <ms>,
"datasets_served": [<kind>, ...],
"warnings": []
}gordon-manager listens for this event and flips the bot dashboard state from Warming to Live. Absence of the event after the manager's grace window (default 60s) surfaces a "warmup never completed" alert — independent of the bot's crash-loop backoff.
On a degraded-mode boot, status is "degraded" and the warnings array carries every partial-dataset notice. The manager dashboard distinguishes the two states.
Troubleshooting stuck warmup
Check gordon-data health:
bashcurl -fsS http://localhost:8081/healthz curl -fsS http://localhost:8081/sources/health | jq .Check the bot's crash-loop logs:
bashdocker compose logs gordon-bot --tail=50 | grep -E "warmup|WARN|ERROR"Verify NATS stream retention:
bashnats stream info gordon-bus # check "Messages" and "Bytes" — should be non-zeroIf gordon-data is down but positions are open and you need the bot to maintain them, use degraded mode as a last resort. Log the justification.
Related endpoints
POST /warmup— the contract this page governs.GET /sources/health— wall-clock freshness of gordon-data's ingestion. Answers "is upstream alive?"GET /healthz/GET /readyz— service-level liveness.readyzfails if any source is stale past its cadence and outside the boot warmup window.
Related
- Troubleshooting — service won't start, NATS consumer lag
- Incident response — halt-latch and quarantine procedures
- Monitoring —
BOT_WARMUP_INCOMPLETEerror code in Loki