Backtesting
Gordon's backtest engine shares the exact same strategy evaluation code path as the live trading loop. There is no separate backtest implementation. The ExecutionModel trait in gordon-domain is the abstraction that makes this possible: BacktestExecution implements it for simulation; gordon-executor implements it for live orders. Both call Strategy::evaluate identically.
ExecutionModel trait
pub trait ExecutionModel: Send + Sync {
async fn submit(&mut self, intent: &OrderIntent) -> Result<ExchangeOrderAck, ExecutionError>;
async fn fill(&mut self, ack: &ExchangeOrderAck) -> Result<FillEvent, ExecutionError>;
async fn position(&self, symbol: &str) -> Option<Position>;
}gordon-manager's backtest engine constructs a BacktestExecution instance and drives the replay loop:
- Load historical klines from the database.
- For each candle, build a
MarketContextand callStrategy::evaluate. - Pass the
Signalthrough the overlay pipeline. - If a tradeable signal emerges, call
execution_model.submit(intent). - On the next candle, detect fills via price-crossing logic and call
execution_model.fill(ack). - Aggregate
FillEventrecords into a run row intrading.*.
gordon-bot's live loop follows the same steps 2–5, substituting live candles and gordon-executor for the DB replay and BacktestExecution.
BacktestExecution
BacktestExecution lives in gordon-strategy::execution::backtest. It simulates fills deterministically:
- Fee model: maker/taker bps + slippage configured per symbol (mirrors
gordon_exchange::FeeModel). - Funding cost: 8h funding rate applied proportional to holding duration.
- Fill detection: per-candle price-crossing (high/low touch). No partial fills in simulation.
- In-memory ledger: insertion-ordered
Vec<OrderId>on the fill path — no non-deterministicHashMapiteration. - No
SystemTime::now()— all timestamps come from the candle'sopen_time.
Byte-parity gate (DP-19)
The invariant is enforced by a CI test, not by policy alone.
gordon-strategy/tests/it_backtest_live_byte_parity.rs routes a 200-candle BTCUSDT/1h fixture through two paths:
- A
BacktestExecution-driven replay (the backtest path). - A hand-rolled
evaluate_tickmock harness that simulates the gordon-bot live driver.
The test asserts the resulting intent sequence is byte-identical on (side, qty, sl_price, entry_price, candle_ts). If the live driver diverges from the backtest path, this test fails at cargo test time — before any code reaches production.
If live diverges from backtest, the bug is in the live driver, not the strategy code. The test makes this diagnosis unambiguous.
Walk-forward protocol
Random train/test splits are invalid on time series data. Future data leaks into training and produces inflated results that do not hold in production.
Walk-forward validation uses rolling temporal windows:
A strategy must perform consistently across all windows. Walk-forward consistency is the percentage of windows where the strategy is profitable. Minimum threshold: 50%.
Candidate promotion gates
| Gate | Requirement |
|---|---|
| Walk-forward Level 1 (signal quality) | Sharpe > 0.5, WF consistency > 50% |
| Walk-forward Level 2 (realistic simulation) | Max drawdown < 30%, positive CAGR net of all costs |
| Testnet validation | Minimum 30 trades on Binance testnet with results matching backtest expectations |
| Live entry | Micro allocation first, scale up only after live behavior matches |
Binance testnet is the forward-test gate. There is no paper-trading simulator in v7 — it was removed at gordon-exchange 3.0.0. Testnet provides a real order book and real fills without capital at risk.
Ablation discipline
Every overlay, every filter, every pipeline modification must be tested in isolation before inclusion. The process:
- Run the full pipeline, record baseline Sharpe.
- Remove exactly one component.
- Re-run, record new Sharpe.
- If Sharpe drops: the component has proven lift. If Sharpe is flat or improves: the component is dead weight.
An overlay that improves one pair but degrades another is not reliable. An overlay that works on 2020–2021 data but fails on 2022–2023 is regime-fitted. Both are rejected.
Determinism requirements
Backtest results must be reproducible. Given the same candle fixture and the same strategy parameters, BacktestExecution must produce byte-identical output on every run. Violations of this are bugs:
- No
SystemTime::now()on any fill path. - No
rand::*(strategies are deterministic functions, not probabilistic samplers). - No non-deterministic
HashMapiteration on paths that produce intent sequences.
Invariants
- Backtest and live call the same
Strategy::evaluate. There is exactly one implementation per strategy. BacktestExecutionis deterministic: same inputs, same outputs, always.- Walk-forward only — no random splits.
- The byte-parity CI gate (
it_backtest_live_byte_parity.rs) is a blocking test. It does not get disabled to unblock a deploy. - Testnet validation precedes live capital. No exceptions.
- All cost components are applied: maker/taker fees, slippage, 8h funding rates.
Related
- Strategies —
Strategytrait andStrategyRegistry. - Execution — live order flow through gordon-executor.
- Data Pipeline — historical klines used as backtest input.