Skip to content

gordon-manager

Purpose

gordon-manager is the control plane and the sole authorised console BFF. It owns bot lifecycle (declarative CRUD on bot_configs, reconciler converges actual state via docker-socket-proxy), green/blue deploys (shadow lease, shadow-compare, atomic swap), and backtest execution via the shared ExecutionModel trait (backtest = live, structurally). As the console BFF it is the only HTTP + WebSocket surface the browser touches for stateful reads and writes into trading.*. It fans out live events over a /ws endpoint, converting snake_case Postgres NOTIFY payloads to camelCase at the edge. It also writes Prometheus file-SD JSON and proxies AlertManager and Prometheus queries for the console.

Version + port + env var

FieldValue
Version3.8.0
Port8083
Env overrideGORDON_MANAGER_BIND_ADDR
DB rolegordon_manager
Imageghcr.io/dlepaux/gordon-manager

HTTP endpoints

Health / ops

MethodPathPurpose
GET/healthzLiveness + degraded-state probe (peer_down, ws_reconnect_storm, ws_subscriber_overload)
GET/readyzReadiness probe
GET/metricsPrometheus metrics
GET/configRedacted config dump
GET/wsWebSocket upgrade — live event fanout to console

Bot lifecycle (internal control plane)

MethodPathPurpose
GET/botsList all bot configs
POST/botsCreate bot config
GET/bots/{id}Get bot config
PUT/bots/{id}Update bot config
DELETE/bots/{id}Delete bot config
POST/bots/{id}/startStart bot
POST/bots/{id}/pausePause bot
POST/bots/{id}/resumeResume bot
POST/bots/{id}/stopStop bot
POST/bots/{id}/promotePromote shadow bot
POST/bots/{id}/clear-quarantineClear reconciler quarantine
GET/bots/{id}/auditAudit log for bot config changes
POST/bots/{id}/deploy/promoteStart green/blue deploy
POST/bots/{id}/deploy/abortAbort in-flight deploy
GET/bots/{id}/deploy/statusDeploy status
GET/bots/{id}/deploysDeploy history

BFF endpoints (console-facing)

MethodPathPurpose
GET/runsBacktest/live run list (cursor-paginated)
GET/runs/{id}Run detail
GET/runs/{id}/roundtripsRoundtrip list for a run
GET/runs/{id}/equityEquity curve for a run
GET/bots/{id}/runsRuns for a bot
GET/bots/{id}/equityEquity curve for a bot
GET/bots/{id}/eventsRecent events for a bot
GET/bff/bots/{bot_id}/metrics-summaryPer-bot Prometheus rollup (6 queries, proxied)
GET/bff/alerts/firingFiring AlertManager alerts (proxied)
POST/bff/risk/resumeProxy resume command to gordon-risk (token-gated; DP-12)
POST/risk/emergency-flattenProxy flatten command to gordon-risk (token-gated)
GET/performancePortfolio performance summary
GET/portfolioCurrent portfolio state
GET/strategiesAvailable strategy list
GET/overlays/decisionsRecent overlay decisions
GET/overlays/summaryOverlay summary
GET/source-healthData source freshness
GET/data/statusData pipeline status
GET/data/inventoryData inventory
GET/data/backfill/jobsBackfill job list
GET/exchange/pingExchange connectivity check
GET/bff/bot-defaultsDefault bot config values
POST/deploys/serviceTrigger service deploy
DELETE/deploys/service/{deploy_id}/abortAbort service deploy
GET/deploys/service/{deploy_id}/statusService deploy status
GET/deploys/serviceService deploy history

NATS subjects

SubjectDirectionDurable consumer
risk.events.>Consumes breaker state-change events from gordon-riskmanager-risk

The breaker_state WS channel is a hybrid: snapshot-on-subscribe reads trading.breaker_state (DB), but live deltas forward the typed BreakerEvent directly from NATS without a SQL round-trip (OBS-1, 2026-05-17).

Database access

ActionDetail
Writertrading.bot_configs, trading.bot_deploys, trading.runs, trading.equity_points
Readertrading.* direct
Views usedv_klines_reader, v_metrics_reader, v_macro_reader, v_funding_rates_reader, v_open_interest_reader (migration 0019 latency exception — direct SELECT on market_data.* tables revoked for this role)
DB rolegordon_manager — least-privilege, migration 0044

Prometheus metrics

26 metrics across 5 groups:

  • gordon_manager_up, gordon_manager_requests_total, gordon_manager_request_duration_seconds
  • gordon_manager_reconcile_ticks_total, gordon_manager_reconcile_duration_seconds, gordon_manager_reconcile_actions_total, gordon_manager_bot_state, gordon_manager_docker_socket_errors_total, gordon_manager_bot_quarantines_total
  • gordon_manager_deploys_total, gordon_manager_deploy_shadow_compare_matches_total, gordon_manager_deploy_shadow_compare_divergences_total, gordon_manager_deploy_duration_seconds
  • gordon_manager_bff_reads_total, gordon_manager_bff_read_duration_seconds, gordon_manager_operator_flatten_requests_total, gordon_manager_file_sd_updates_total, gordon_manager_file_sd_write_duration_seconds, gordon_manager_notify_payloads_received_total, gordon_manager_portfolio_drift_total
  • gordon_manager_ws_active_connections, gordon_manager_ws_messages_sent_total, gordon_manager_ws_overflow_total, gordon_manager_ws_replay_rows_total, gordon_manager_ws_batches_sent_total, gordon_manager_ws_batch_size
  • gordon_manager_stack_health_probes_total, gordon_manager_stack_health_transitions_total
  • gordon_errors_total

Key env vars

VariablePurpose
GORDON_MANAGER_BIND_ADDRHTTP bind address (default :8083)
GORDON_DATABASE_URLPostgres connection string
GORDON_BUS_NATS_URLNATS JetStream URL (enables manager-risk consumer)
GORDON_MANAGER_OPERATOR_TOKENToken required on BFF mutation endpoints
GORDON_MANAGER__GORDON_RISK_OPERATOR_TOKENForward token manager sends to gordon-risk when proxying resume/flatten
GORDON_MANAGER__GORDON_ALERTMANAGER_URLAlertManager base URL for /bff/alerts/firing proxy
GORDON_MANAGER__GORDON_PROMETHEUS_URLPrometheus base URL for /bff/bots/{id}/metrics-summary proxy
DOCKER_SOCKET_PROXY_URLdocker-socket-proxy URL for bot container lifecycle

Invariants

  • Declarative bot lifecycle. A bot_config row with desired_state=running is the contract; the reconciler converges reality. Never drive bots imperatively.
  • Docker-socket-proxy only. Never mount /var/run/docker.sock raw.
  • No trading keys. Orchestrating is not trading. Manager has zero Binance credentials.
  • BFF auth on safety-critical endpoints. POST /bff/emergency-flatten and POST /bff/risk/resume require X-Operator-Token checked via constant-time comparison.
  • Cursor pagination on BFF list endpoints. /bff/runs and equity endpoints use opaque next_cursor. Offset/limit is banned — inconsistent under concurrent writes.
  • OpenAPI is source of truth. Regenerate openapi.yaml alongside any BFF handler change. Stale spec = stale TS client.
  • snake_case → camelCase at the edge. Five WS DirectFromNotify channels (risk_halt_changed, breaker_state, portfolio_state, source_freshness, backfill_jobs) re-serialise Postgres NOTIFY payloads as camelCase before sending to the browser.
  • Backtest = Live. ExecutionModel trait is shared with gordon-executor. The parity test tests/backtest_live_parity.rs is the enforcement gate.
  • DP-12 open (console side). Manager's POST /bff/risk/resume proxy is wired and enforcing the BFF rule. The console still calls gordon-risk's POST /risk/resume directly via riskWriterClient — the console side of DP-12 is not yet updated.

Status

Phase 5 Manager: 15/17 stories done. One open story (DP-12 console update). Green/blue deploy, backtests, WS fanout, BFF proxy, file-SD, stack health all shipped. Coverage ≥70% overall, ≥85% on safety-critical paths.

Gordon — keep compounding without blowing up