Skip to content

Backtesting

Gordon's backtest engine shares the exact same strategy evaluation code path as the live trading loop. There is no separate backtest implementation. The ExecutionModel trait in gordon-domain is the abstraction that makes this possible: BacktestExecution implements it for simulation; gordon-executor implements it for live orders. Both call Strategy::evaluate identically.

ExecutionModel trait

rust
pub trait ExecutionModel: Send + Sync {
    async fn submit(&mut self, intent: &OrderIntent) -> Result<ExchangeOrderAck, ExecutionError>;
    async fn fill(&mut self, ack: &ExchangeOrderAck) -> Result<FillEvent, ExecutionError>;
    async fn position(&self, symbol: &str) -> Option<Position>;
}

gordon-manager's backtest engine constructs a BacktestExecution instance and drives the replay loop:

  1. Load historical klines from the database.
  2. For each candle, build a MarketContext and call Strategy::evaluate.
  3. Pass the Signal through the overlay pipeline.
  4. If a tradeable signal emerges, call execution_model.submit(intent).
  5. On the next candle, detect fills via price-crossing logic and call execution_model.fill(ack).
  6. Aggregate FillEvent records into a run row in trading.*.

gordon-bot's live loop follows the same steps 2–5, substituting live candles and gordon-executor for the DB replay and BacktestExecution.

BacktestExecution

BacktestExecution lives in gordon-strategy::execution::backtest. It simulates fills deterministically:

  • Fee model: maker/taker bps + slippage configured per symbol (mirrors gordon_exchange::FeeModel).
  • Funding cost: 8h funding rate applied proportional to holding duration.
  • Fill detection: per-candle price-crossing (high/low touch). No partial fills in simulation.
  • In-memory ledger: insertion-ordered Vec<OrderId> on the fill path — no non-deterministic HashMap iteration.
  • No SystemTime::now() — all timestamps come from the candle's open_time.

Byte-parity gate (DP-19)

The invariant is enforced by a CI test, not by policy alone.

gordon-strategy/tests/it_backtest_live_byte_parity.rs routes a 200-candle BTCUSDT/1h fixture through two paths:

  1. A BacktestExecution-driven replay (the backtest path).
  2. A hand-rolled evaluate_tick mock harness that simulates the gordon-bot live driver.

The test asserts the resulting intent sequence is byte-identical on (side, qty, sl_price, entry_price, candle_ts). If the live driver diverges from the backtest path, this test fails at cargo test time — before any code reaches production.

If live diverges from backtest, the bug is in the live driver, not the strategy code. The test makes this diagnosis unambiguous.

Walk-forward protocol

Random train/test splits are invalid on time series data. Future data leaks into training and produces inflated results that do not hold in production.

Walk-forward validation uses rolling temporal windows:

A strategy must perform consistently across all windows. Walk-forward consistency is the percentage of windows where the strategy is profitable. Minimum threshold: 50%.

Candidate promotion gates

GateRequirement
Walk-forward Level 1 (signal quality)Sharpe > 0.5, WF consistency > 50%
Walk-forward Level 2 (realistic simulation)Max drawdown < 30%, positive CAGR net of all costs
Testnet validationMinimum 30 trades on Binance testnet with results matching backtest expectations
Live entryMicro allocation first, scale up only after live behavior matches

Binance testnet is the forward-test gate. There is no paper-trading simulator in v7 — it was removed at gordon-exchange 3.0.0. Testnet provides a real order book and real fills without capital at risk.

Ablation discipline

Every overlay, every filter, every pipeline modification must be tested in isolation before inclusion. The process:

  1. Run the full pipeline, record baseline Sharpe.
  2. Remove exactly one component.
  3. Re-run, record new Sharpe.
  4. If Sharpe drops: the component has proven lift. If Sharpe is flat or improves: the component is dead weight.

An overlay that improves one pair but degrades another is not reliable. An overlay that works on 2020–2021 data but fails on 2022–2023 is regime-fitted. Both are rejected.

Determinism requirements

Backtest results must be reproducible. Given the same candle fixture and the same strategy parameters, BacktestExecution must produce byte-identical output on every run. Violations of this are bugs:

  • No SystemTime::now() on any fill path.
  • No rand::* (strategies are deterministic functions, not probabilistic samplers).
  • No non-deterministic HashMap iteration on paths that produce intent sequences.

Invariants

  • Backtest and live call the same Strategy::evaluate. There is exactly one implementation per strategy.
  • BacktestExecution is deterministic: same inputs, same outputs, always.
  • Walk-forward only — no random splits.
  • The byte-parity CI gate (it_backtest_live_byte_parity.rs) is a blocking test. It does not get disabled to unblock a deploy.
  • Testnet validation precedes live capital. No exceptions.
  • All cost components are applied: maker/taker fees, slippage, 8h funding rates.
  • StrategiesStrategy trait and StrategyRegistry.
  • Execution — live order flow through gordon-executor.
  • Data Pipeline — historical klines used as backtest input.

Gordon — keep compounding without blowing up