Backtesting Guide

Validate your strategies against historical market data before risking real capital. Podium's backtesting engine supports deterministic Strategy SDK code with concurrent tick execution, realistic fill models, Monte Carlo simulation, walk-forward analysis, and comprehensive strategy scoring.

How It Works

A backtest replays historical market data day by day, asking your agent to make trading decisions at each step. The engine simulates portfolio management, trade execution, fees, and slippage to produce a realistic performance history.

Backtest Execution Flow

Universe Resolution — Resolve symbols from your agent's universe config (index, sectors, market cap filters) against the security master database.
Market Data Fetch — Load daily OHLCV bars for all symbols in the date range from Databento (cached in Neon for performance).
Corporate Actions — Fetch stock splits and dividends. Split adjustments are applied point-in-time to prevent look-ahead bias.
Tick Loop — For each trading day, the engine presents market data and portfolio state to your agent. The agent returns target portfolio weights or individual buy/sell decisions.
Constraint Enforcement — The constraint engine validates decisions against risk limits (max position size, sector concentration, max turnover) and adjusts weights if needed.
Trade Execution — Trades are executed at the next day's close price (or with realistic spread/impact modeling). The portfolio simulator tracks cash, positions, and equity.
Metrics Calculation — After all days are processed, the engine computes performance metrics, runs Monte Carlo simulation, walk-forward analysis, regime detection, and strategy scoring.

Concurrent Tick Execution

By default, the engine processes ticks in concurrent windows (default: 10 days at a time, up to 15 parallel LLM calls). All ticks in a window share the same portfolio state snapshot, and results are reconciled sequentially afterward. This reduces a 6-month backtest from 10+ minutes to under 2 minutes.

Strategy SDK Backtesting

Strategies are backtested by running your Python code in a sandboxed environment against historical data. The engine records normalized target weights, trade simulation, risk guardrail decisions, and performance metrics in one repeatable run record.

Configuration

When launching a backtest, you configure the following parameters:

Parameter	Range	Description
Start Date	2016-01-01 onward	Earliest date for cached market data availability
End Date	Up to today	Max 365 calendar days from start date (~180 trading days for equities)
Initial Capital	$1,000 — $10,000,000	Starting cash for the simulated portfolio
Slippage	0% — 5%	Flat slippage applied to each trade (idealized mode only)
Execution Model	Idealized / Realistic	How trade fills are simulated (see below)
Max Participation Rate	1% — 50% of ADV	Liquidity cap for realistic mode (default 10%)
Calendar Mode	Equities / Crypto 24/7	Auto-detected from universe. Equities: Mon-Fri (252 days/yr). Crypto: every day (365 days/yr).
Concurrent Ticks	On/Off	Enable parallel LLM calls for faster backtests (default: on)
Window Size	1 — 30 days	How many days to process concurrently (default: 10)
Survivorship Free	On/Off	Include delisted securities and force-liquidate at last available price

Execution Models

The execution model determines how trade fills are simulated. Choosing the right model affects the realism and reliability of your backtest results.

Idealized Close (Default)

Trades fill at the next day's closing price with a flat slippage percentage applied. This is the simplest model and is useful for quick iteration and strategy validation.

Fill price = close price × (1 + slippage) for buys, × (1 - slippage) for sells
All orders fill completely (no partial fills)
No spread or market impact modeling
Best for: initial strategy validation, comparing strategy variants

Alpaca Daily Realistic

A more realistic execution model that accounts for bid-ask spread, market impact, and liquidity constraints. This produces more conservative (and more accurate) results.

Spread estimation — Uses the Corwin-Schultz estimator to estimate bid-ask spread from daily high/low prices. Wider spreads for less liquid names.
Market impact — Square-root impact model based on trade size relative to average daily volume. Larger orders move the price more.
Partial fills — Orders are capped at the max participation rate (default 10% of daily volume). If your order exceeds this, you get a partial fill.
Fill rate — Calculated based on order size vs. available liquidity. Reported in execution details for each trade.
Best for: final validation before deployment, capacity analysis, realistic P&L estimation

Crypto Fee Model

Crypto backtests use a volume-tiered fee calculator based on rolling 30-day trading volume, similar to exchange fee schedules. Higher volume earns lower fees. Equity backtests assume zero commission (consistent with commission-free equity trading).

Performance Metrics

After a backtest completes, the engine calculates a comprehensive set of performance metrics. These are displayed on the results page and used for strategy scoring.

Cumulative Return

Total return over the backtest period. Calculated as (final equity - initial capital) / initial capital.

Annualized Return

Cumulative return scaled to a yearly rate. Uses 252 trading days for equities, 365 for crypto. Formula: (1 + cumReturn)^(annualizationFactor / tradingDays) - 1.

Sharpe Ratio

Risk-adjusted return. Measures excess return per unit of total volatility. Uses 5% annual risk-free rate. A Sharpe above 1.0 is generally considered good; above 2.0 is excellent.

Sortino Ratio

Like Sharpe, but only penalizes downside volatility. More relevant for strategies that have asymmetric return distributions (large gains, small losses).

Maximum Drawdown

Worst peak-to-trough decline during the backtest. A drawdown of 20% means the portfolio fell 20% from its highest point before recovering.

Win Rate

Percentage of sell trades that were profitable (sell price > average buy price for that symbol). Calculated using FIFO matching.

Profit Factor

Gross profit divided by gross loss. A profit factor above 1.0 means the strategy is profitable overall. Above 2.0 is strong.

Average Holding Period

Average number of days between buying and selling a position. Calculated using FIFO matching of buy/sell pairs.

Total Fees

Sum of all trading fees incurred. Zero for equities (commission-free), volume-tiered for crypto.

Total Dividends

Total dividend income received during the backtest period. Dividends are credited on ex-dates based on shares held.

Strategy Scoring

Every completed backtest receives a composite strategy score (0-100) and a letter grade (A through F). The score is computed from four weighted components:

Return Quality (40%)

Evaluates Sharpe ratio, Sortino ratio, and annualized return. Higher risk-adjusted returns score better.

Robustness (25%)

Based on walk-forward consistency and train/test degradation ratio. Strategies that perform well out-of-sample score higher.

Risk Management (20%)

Evaluates maximum drawdown and win rate. Lower drawdowns and higher win rates score better.

Activity (15%)

Evaluates trade count and profit factor. Strategies that trade actively with positive expectancy score higher.

Grade Scale

A = 80-100B = 60-79C = 40-59D = 20-39F = 0-19

Flags

The scorer also generates warning flags for potential issues:

Negative Sharpe ratio
Drawdown exceeding 30%
Win rate below 40%
Fewer than 10 trades (insufficient sample)
Low walk-forward consistency (<50%)
High train/test degradation

Advanced Analysis

Beyond basic metrics, the backtesting engine produces several advanced analyses that help you understand whether your strategy is robust or potentially overfit.

Monte Carlo Simulation

Generates 1,000 stochastic simulations by bootstrapping from your strategy's historical daily return distribution. This produces confidence bands showing the range of possible outcomes.

P5 / P50 / P95 bands — The 5th, 50th, and 95th percentile equity paths. The P5-P95 range represents the 90% confidence interval.
Probability of profit — Fraction of simulated paths that end above the starting capital.
Median final equity — The expected outcome under typical conditions.
Uses a fixed seed (42) for reproducible results across runs.
Requires at least 20 trading days of data to run.

Walk-Forward Analysis

The most important robustness check for detecting overfitting. Splits the equity curve into multiple train/test windows and measures out-of-sample consistency.

Window count — Default 5 non-overlapping windows. Each window is split 70% train / 30% test.
Consistency — Fraction of test windows that were profitable. Above 60% is good.
Degradation ratio — Average test daily return / average train daily return. Above 0.5 is acceptable; below suggests overfitting.
Train/test correlation — Do in-sample and out-of-sample returns move together? Above 0.5 is good.
Per-window metrics — Each window shows train return, test return, train Sharpe, test Sharpe, and test max drawdown.

Market Regime Analysis

Detects market regimes from benchmark data and measures your strategy's performance in each regime. This reveals whether your strategy only works in bull markets or is robust across conditions.

Regimes detected — Bull, Bear, Sideways, High Volatility, Low Volatility
Per-regime metrics: total return, Sharpe ratio, max drawdown, trading days
Uses SPY for equity benchmarking, BTC/USD for crypto

Performance Attribution

Decomposes your strategy's return into its sources to understand what drove performance.

Alpha return — Return attributable to your strategy's skill (above what the market delivered)
Beta return — Return from market exposure (what you would have earned just holding the benchmark)
Residual return — Unexplained return (noise or factors not captured)
Sector attribution — Return contribution from each sector
Factor exposure — Exposure to common risk factors

Capacity Estimation

Estimates the maximum AUM (assets under management) your strategy can handle before market impact degrades performance.

Capacity AUM — Maximum portfolio size before the binding constraint is hit
Binding symbol — The least liquid symbol that limits capacity
Binding constraint — Whether capacity is limited by volume, spread, or impact
Based on the max participation rate (default 10% of daily volume)

Results Visualization

The backtest results page provides several visualizations to help you understand your strategy's behavior:

Equity Curve

Portfolio value over time with Monte Carlo confidence bands (P5/P50/P95) overlaid.

Drawdown Chart

Peak-to-trough decline over time. Shows how deep and how long drawdowns lasted.

Trade Log

Every trade with date, symbol, action, quantity, price, value, slippage, fees, and execution details (for realistic mode).

Decision Log

The LLM's reasoning for each tick: action taken, confidence level, tool calls made, and token usage.

Strategy Score Card

Overall score, grade, component breakdown, and warning flags.

Walk-Forward Windows

Per-window train/test returns and consistency metrics.

Limits & Quotas

Daily Quota

30 backtests / day

Resets at midnight UTC

Max Date Range

365 calendar days

~180 trading days for equities

Earliest Start Date

2016-01-01

Market data availability limit

Capital Range

$1K — $10M

Initial capital for simulation

Slippage Range

0% — 5%

Flat slippage (idealized mode)

Concurrent Window

1 — 30 days

Default: 10 days, 15 max concurrency

Tips for Effective Backtesting

Start with idealized, finish with realistic

Use the idealized execution model for rapid iteration while tuning your strategy. Once you're happy with the results, switch to the realistic model for final validation. The realistic model will typically show lower returns due to spread and impact costs.

Watch for overfitting

If your strategy scores well on cumulative return but poorly on walk-forward consistency, it may be overfit to the specific date range. Try different date ranges and check that performance is stable. A walk-forward consistency above 60% and a degradation ratio above 0.5 are good signs.

Use survivorship-free mode for accuracy

Enable survivorship-free mode to include delisted securities in your backtest. Without it, your universe only contains stocks that survived to the present day, which introduces a positive bias. With it enabled, positions in delisted securities are force-liquidated at the last available price.

Check capacity before scaling

The capacity estimation tells you the maximum AUM your strategy can handle. If you plan to increase capital, make sure the capacity estimate supports it. Strategies trading small-cap or low-volume names will have lower capacity.

Realism warnings matter

The backtest results include a warnings section that discloses execution model, calendar mode, shorts policy, corporate actions applied, and any partial fills. Review these to understand the assumptions behind your results.