gordon-manager
Purpose
gordon-manager is the control plane and the sole authorised console BFF. It owns bot lifecycle (declarative CRUD on bot_configs, reconciler converges actual state via docker-socket-proxy), green/blue deploys (shadow lease, shadow-compare, atomic swap), and backtest execution via the shared ExecutionModel trait (backtest = live, structurally). As the console BFF it is the only HTTP + WebSocket surface the browser touches for stateful reads and writes into trading.*. It fans out live events over a /ws endpoint, converting snake_case Postgres NOTIFY payloads to camelCase at the edge. It also writes Prometheus file-SD JSON and proxies AlertManager and Prometheus queries for the console.
Version + port + env var
| Field | Value |
|---|---|
| Version | 3.8.0 |
| Port | 8083 |
| Env override | GORDON_MANAGER_BIND_ADDR |
| DB role | gordon_manager |
| Image | ghcr.io/dlepaux/gordon-manager |
HTTP endpoints
Health / ops
| Method | Path | Purpose |
|---|---|---|
| GET | /healthz | Liveness + degraded-state probe (peer_down, ws_reconnect_storm, ws_subscriber_overload) |
| GET | /readyz | Readiness probe |
| GET | /metrics | Prometheus metrics |
| GET | /config | Redacted config dump |
| GET | /ws | WebSocket upgrade — live event fanout to console |
Bot lifecycle (internal control plane)
| Method | Path | Purpose |
|---|---|---|
| GET | /bots | List all bot configs |
| POST | /bots | Create bot config |
| GET | /bots/{id} | Get bot config |
| PUT | /bots/{id} | Update bot config |
| DELETE | /bots/{id} | Delete bot config |
| POST | /bots/{id}/start | Start bot |
| POST | /bots/{id}/pause | Pause bot |
| POST | /bots/{id}/resume | Resume bot |
| POST | /bots/{id}/stop | Stop bot |
| POST | /bots/{id}/promote | Promote shadow bot |
| POST | /bots/{id}/clear-quarantine | Clear reconciler quarantine |
| GET | /bots/{id}/audit | Audit log for bot config changes |
| POST | /bots/{id}/deploy/promote | Start green/blue deploy |
| POST | /bots/{id}/deploy/abort | Abort in-flight deploy |
| GET | /bots/{id}/deploy/status | Deploy status |
| GET | /bots/{id}/deploys | Deploy history |
BFF endpoints (console-facing)
| Method | Path | Purpose |
|---|---|---|
| GET | /runs | Backtest/live run list (cursor-paginated) |
| GET | /runs/{id} | Run detail |
| GET | /runs/{id}/roundtrips | Roundtrip list for a run |
| GET | /runs/{id}/equity | Equity curve for a run |
| GET | /bots/{id}/runs | Runs for a bot |
| GET | /bots/{id}/equity | Equity curve for a bot |
| GET | /bots/{id}/events | Recent events for a bot |
| GET | /bff/bots/{bot_id}/metrics-summary | Per-bot Prometheus rollup (6 queries, proxied) |
| GET | /bff/alerts/firing | Firing AlertManager alerts (proxied) |
| POST | /bff/risk/resume | Proxy resume command to gordon-risk (token-gated; DP-12) |
| POST | /risk/emergency-flatten | Proxy flatten command to gordon-risk (token-gated) |
| GET | /performance | Portfolio performance summary |
| GET | /portfolio | Current portfolio state |
| GET | /strategies | Available strategy list |
| GET | /overlays/decisions | Recent overlay decisions |
| GET | /overlays/summary | Overlay summary |
| GET | /source-health | Data source freshness |
| GET | /data/status | Data pipeline status |
| GET | /data/inventory | Data inventory |
| GET | /data/backfill/jobs | Backfill job list |
| GET | /exchange/ping | Exchange connectivity check |
| GET | /bff/bot-defaults | Default bot config values |
| POST | /deploys/service | Trigger service deploy |
| DELETE | /deploys/service/{deploy_id}/abort | Abort service deploy |
| GET | /deploys/service/{deploy_id}/status | Service deploy status |
| GET | /deploys/service | Service deploy history |
NATS subjects
| Subject | Direction | Durable consumer |
|---|---|---|
risk.events.> | Consumes breaker state-change events from gordon-risk | manager-risk |
The breaker_state WS channel is a hybrid: snapshot-on-subscribe reads trading.breaker_state (DB), but live deltas forward the typed BreakerEvent directly from NATS without a SQL round-trip (OBS-1, 2026-05-17).
Database access
| Action | Detail |
|---|---|
| Writer | trading.bot_configs, trading.bot_deploys, trading.runs, trading.equity_points |
| Reader | trading.* direct |
| Views used | v_klines_reader, v_metrics_reader, v_macro_reader, v_funding_rates_reader, v_open_interest_reader (migration 0019 latency exception — direct SELECT on market_data.* tables revoked for this role) |
| DB role | gordon_manager — least-privilege, migration 0044 |
Prometheus metrics
26 metrics across 5 groups:
gordon_manager_up,gordon_manager_requests_total,gordon_manager_request_duration_secondsgordon_manager_reconcile_ticks_total,gordon_manager_reconcile_duration_seconds,gordon_manager_reconcile_actions_total,gordon_manager_bot_state,gordon_manager_docker_socket_errors_total,gordon_manager_bot_quarantines_totalgordon_manager_deploys_total,gordon_manager_deploy_shadow_compare_matches_total,gordon_manager_deploy_shadow_compare_divergences_total,gordon_manager_deploy_duration_secondsgordon_manager_bff_reads_total,gordon_manager_bff_read_duration_seconds,gordon_manager_operator_flatten_requests_total,gordon_manager_file_sd_updates_total,gordon_manager_file_sd_write_duration_seconds,gordon_manager_notify_payloads_received_total,gordon_manager_portfolio_drift_totalgordon_manager_ws_active_connections,gordon_manager_ws_messages_sent_total,gordon_manager_ws_overflow_total,gordon_manager_ws_replay_rows_total,gordon_manager_ws_batches_sent_total,gordon_manager_ws_batch_sizegordon_manager_stack_health_probes_total,gordon_manager_stack_health_transitions_totalgordon_errors_total
Key env vars
| Variable | Purpose |
|---|---|
GORDON_MANAGER_BIND_ADDR | HTTP bind address (default :8083) |
GORDON_DATABASE_URL | Postgres connection string |
GORDON_BUS_NATS_URL | NATS JetStream URL (enables manager-risk consumer) |
GORDON_MANAGER_OPERATOR_TOKEN | Token required on BFF mutation endpoints |
GORDON_MANAGER__GORDON_RISK_OPERATOR_TOKEN | Forward token manager sends to gordon-risk when proxying resume/flatten |
GORDON_MANAGER__GORDON_ALERTMANAGER_URL | AlertManager base URL for /bff/alerts/firing proxy |
GORDON_MANAGER__GORDON_PROMETHEUS_URL | Prometheus base URL for /bff/bots/{id}/metrics-summary proxy |
DOCKER_SOCKET_PROXY_URL | docker-socket-proxy URL for bot container lifecycle |
Invariants
- Declarative bot lifecycle. A
bot_configrow withdesired_state=runningis the contract; the reconciler converges reality. Never drive bots imperatively. - Docker-socket-proxy only. Never mount
/var/run/docker.sockraw. - No trading keys. Orchestrating is not trading. Manager has zero Binance credentials.
- BFF auth on safety-critical endpoints.
POST /bff/emergency-flattenandPOST /bff/risk/resumerequireX-Operator-Tokenchecked via constant-time comparison. - Cursor pagination on BFF list endpoints.
/bff/runsand equity endpoints use opaquenext_cursor. Offset/limit is banned — inconsistent under concurrent writes. - OpenAPI is source of truth. Regenerate
openapi.yamlalongside any BFF handler change. Stale spec = stale TS client. - snake_case → camelCase at the edge. Five WS DirectFromNotify channels (
risk_halt_changed,breaker_state,portfolio_state,source_freshness,backfill_jobs) re-serialise Postgres NOTIFY payloads as camelCase before sending to the browser. - Backtest = Live.
ExecutionModeltrait is shared with gordon-executor. The parity testtests/backtest_live_parity.rsis the enforcement gate. - DP-12 open (console side). Manager's
POST /bff/risk/resumeproxy is wired and enforcing the BFF rule. The console still calls gordon-risk'sPOST /risk/resumedirectly viariskWriterClient— the console side of DP-12 is not yet updated.
Status
Phase 5 Manager: 15/17 stories done. One open story (DP-12 console update). Green/blue deploy, backtests, WS fanout, BFF proxy, file-SD, stack health all shipped. Coverage ≥70% overall, ≥85% on safety-critical paths.
Related
- Architecture: event-bus-topology
- Contracts: halt-latch
- Contracts: operator-tokens
- Reference: error codes
- gordon-bot — spawned via docker-socket-proxy
- gordon-risk — consumes risk events, proxies flatten + resume
- gordon-console — sole browser consumer of this BFF