Backtesting Guide
Validate your strategies against historical market data before risking real capital. Podium's backtesting engine supports both no-code and SDK agents with concurrent tick execution, realistic fill models, Monte Carlo simulation, walk-forward analysis, and comprehensive strategy scoring.
How It Works
A backtest replays historical market data day by day, asking your agent to make trading decisions at each step. The engine simulates portfolio management, trade execution, fees, and slippage to produce a realistic performance history.
Backtest Execution Flow
- Universe Resolution — Resolve symbols from your agent's universe config (index, sectors, market cap filters) against the security master database.
- Market Data Fetch — Load daily OHLCV bars for all symbols in the date range from Alpaca's market data API (cached in Neon for performance).
- Corporate Actions — Fetch stock splits and dividends. Split adjustments are applied point-in-time to prevent look-ahead bias.
- Tick Loop — For each trading day, the engine presents market data and portfolio state to your agent. The agent returns target portfolio weights or individual buy/sell decisions.
- Constraint Enforcement — The constraint engine validates decisions against risk limits (max position size, sector concentration, max turnover) and adjusts weights if needed.
- Trade Execution — Trades are executed at the next day's close price (or with realistic spread/impact modeling). The portfolio simulator tracks cash, positions, and equity.
- Metrics Calculation — After all days are processed, the engine computes performance metrics, runs Monte Carlo simulation, walk-forward analysis, regime detection, and strategy scoring.
Concurrent Tick Execution
By default, the engine processes ticks in concurrent windows (default: 10 days at a time, up to 15 parallel LLM calls). All ticks in a window share the same portfolio state snapshot, and results are reconciled sequentially afterward. This reduces a 6-month backtest from 10+ minutes to under 2 minutes.
No-Code vs SDK Backtesting
Both agent types use the same unified backtest infrastructure. No-code agents are backtested by replaying the LLM decision loop with gpt-4.1-mini. SDK agents are backtested by running your Python code in a sandboxed environment against historical data. Results are stored in the same unified table and produce identical metrics.
Configuration
When launching a backtest, you configure the following parameters:
| Parameter | Range | Description |
|---|---|---|
| Start Date | 2016-01-01 onward | Earliest date for Alpaca market data availability |
| End Date | Up to today | Max 365 calendar days from start date (~180 trading days for equities) |
| Initial Capital | $1,000 — $10,000,000 | Starting cash for the simulated portfolio |
| Slippage | 0% — 5% | Flat slippage applied to each trade (idealized mode only) |
| Execution Model | Idealized / Realistic | How trade fills are simulated (see below) |
| Max Participation Rate | 1% — 50% of ADV | Liquidity cap for realistic mode (default 10%) |
| Calendar Mode | Equities / Crypto 24/7 | Auto-detected from universe. Equities: Mon-Fri (252 days/yr). Crypto: every day (365 days/yr). |
| Concurrent Ticks | On/Off | Enable parallel LLM calls for faster backtests (default: on) |
| Window Size | 1 — 30 days | How many days to process concurrently (default: 10) |
| Survivorship Free | On/Off | Include delisted securities and force-liquidate at last available price |
Execution Models
The execution model determines how trade fills are simulated. Choosing the right model affects the realism and reliability of your backtest results.
Idealized Close (Default)
Trades fill at the next day's closing price with a flat slippage percentage applied. This is the simplest model and is useful for quick iteration and strategy validation.
- Fill price = close price × (1 + slippage) for buys, × (1 - slippage) for sells
- All orders fill completely (no partial fills)
- No spread or market impact modeling
- Best for: initial strategy validation, comparing strategy variants
Alpaca Daily Realistic
A more realistic execution model that accounts for bid-ask spread, market impact, and liquidity constraints. This produces more conservative (and more accurate) results.
- Spread estimation — Uses the Corwin-Schultz estimator to estimate bid-ask spread from daily high/low prices. Wider spreads for less liquid names.
- Market impact — Square-root impact model based on trade size relative to average daily volume. Larger orders move the price more.
- Partial fills — Orders are capped at the max participation rate (default 10% of daily volume). If your order exceeds this, you get a partial fill.
- Fill rate — Calculated based on order size vs. available liquidity. Reported in execution details for each trade.
- Best for: final validation before deployment, capacity analysis, realistic P&L estimation
Crypto Fee Model
Crypto backtests use a volume-tiered fee calculator based on rolling 30-day trading volume, similar to exchange fee schedules. Higher volume earns lower fees. Equity backtests assume zero commission (consistent with Alpaca's commission-free trading).
Performance Metrics
After a backtest completes, the engine calculates a comprehensive set of performance metrics. These are displayed on the results page and used for strategy scoring.
Cumulative Return
Total return over the backtest period. Calculated as (final equity - initial capital) / initial capital.
Annualized Return
Cumulative return scaled to a yearly rate. Uses 252 trading days for equities, 365 for crypto. Formula: (1 + cumReturn)^(annualizationFactor / tradingDays) - 1.
Sharpe Ratio
Risk-adjusted return. Measures excess return per unit of total volatility. Uses 5% annual risk-free rate. A Sharpe above 1.0 is generally considered good; above 2.0 is excellent.
Sortino Ratio
Like Sharpe, but only penalizes downside volatility. More relevant for strategies that have asymmetric return distributions (large gains, small losses).
Maximum Drawdown
Worst peak-to-trough decline during the backtest. A drawdown of 20% means the portfolio fell 20% from its highest point before recovering.
Win Rate
Percentage of sell trades that were profitable (sell price > average buy price for that symbol). Calculated using FIFO matching.
Profit Factor
Gross profit divided by gross loss. A profit factor above 1.0 means the strategy is profitable overall. Above 2.0 is strong.
Average Holding Period
Average number of days between buying and selling a position. Calculated using FIFO matching of buy/sell pairs.
Total Fees
Sum of all trading fees incurred. Zero for equities (commission-free via Alpaca), volume-tiered for crypto.
Total Dividends
Total dividend income received during the backtest period. Dividends are credited on ex-dates based on shares held.
Strategy Scoring
Every completed backtest receives a composite strategy score (0-100) and a letter grade (A through F). The score is computed from four weighted components:
Return Quality (40%)
Evaluates Sharpe ratio, Sortino ratio, and annualized return. Higher risk-adjusted returns score better.
Robustness (25%)
Based on walk-forward consistency and train/test degradation ratio. Strategies that perform well out-of-sample score higher.
Risk Management (20%)
Evaluates maximum drawdown and win rate. Lower drawdowns and higher win rates score better.
Activity (15%)
Evaluates trade count and profit factor. Strategies that trade actively with positive expectancy score higher.
Grade Scale
Flags
The scorer also generates warning flags for potential issues:
- Negative Sharpe ratio
- Drawdown exceeding 30%
- Win rate below 40%
- Fewer than 10 trades (insufficient sample)
- Low walk-forward consistency (<50%)
- High train/test degradation
Advanced Analysis
Beyond basic metrics, the backtesting engine produces several advanced analyses that help you understand whether your strategy is robust or potentially overfit.
Monte Carlo Simulation
Generates 1,000 stochastic simulations by bootstrapping from your strategy's historical daily return distribution. This produces confidence bands showing the range of possible outcomes.
- P5 / P50 / P95 bands — The 5th, 50th, and 95th percentile equity paths. The P5-P95 range represents the 90% confidence interval.
- Probability of profit — Fraction of simulated paths that end above the starting capital.
- Median final equity — The expected outcome under typical conditions.
- Uses a fixed seed (42) for reproducible results across runs.
- Requires at least 20 trading days of data to run.
Walk-Forward Analysis
The most important robustness check for detecting overfitting. Splits the equity curve into multiple train/test windows and measures out-of-sample consistency.
- Window count — Default 5 non-overlapping windows. Each window is split 70% train / 30% test.
- Consistency — Fraction of test windows that were profitable. Above 60% is good.
- Degradation ratio — Average test daily return / average train daily return. Above 0.5 is acceptable; below suggests overfitting.
- Train/test correlation — Do in-sample and out-of-sample returns move together? Above 0.5 is good.
- Per-window metrics — Each window shows train return, test return, train Sharpe, test Sharpe, and test max drawdown.
Market Regime Analysis
Detects market regimes from benchmark data and measures your strategy's performance in each regime. This reveals whether your strategy only works in bull markets or is robust across conditions.
- Regimes detected — Bull, Bear, Sideways, High Volatility, Low Volatility
- Per-regime metrics: total return, Sharpe ratio, max drawdown, trading days
- Uses SPY for equity benchmarking, BTC/USD for crypto
Performance Attribution
Decomposes your strategy's return into its sources to understand what drove performance.
- Alpha return — Return attributable to your strategy's skill (above what the market delivered)
- Beta return — Return from market exposure (what you would have earned just holding the benchmark)
- Residual return — Unexplained return (noise or factors not captured)
- Sector attribution — Return contribution from each sector
- Factor exposure — Exposure to common risk factors
Capacity Estimation
Estimates the maximum AUM (assets under management) your strategy can handle before market impact degrades performance.
- Capacity AUM — Maximum portfolio size before the binding constraint is hit
- Binding symbol — The least liquid symbol that limits capacity
- Binding constraint — Whether capacity is limited by volume, spread, or impact
- Based on the max participation rate (default 10% of daily volume)
Results Visualization
The backtest results page provides several visualizations to help you understand your strategy's behavior:
Equity Curve
Portfolio value over time with Monte Carlo confidence bands (P5/P50/P95) overlaid.
Drawdown Chart
Peak-to-trough decline over time. Shows how deep and how long drawdowns lasted.
Trade Log
Every trade with date, symbol, action, quantity, price, value, slippage, fees, and execution details (for realistic mode).
Decision Log
The LLM's reasoning for each tick: action taken, confidence level, tool calls made, and token usage.
Strategy Score Card
Overall score, grade, component breakdown, and warning flags.
Walk-Forward Windows
Per-window train/test returns and consistency metrics.
Limits & Quotas
Daily Quota
30 backtests / day
Resets at midnight UTC
Max Date Range
365 calendar days
~180 trading days for equities
Earliest Start Date
2016-01-01
Alpaca data availability limit
Capital Range
$1K — $10M
Initial capital for simulation
Slippage Range
0% — 5%
Flat slippage (idealized mode)
Concurrent Window
1 — 30 days
Default: 10 days, 15 max concurrency
Tips for Effective Backtesting
Start with idealized, finish with realistic
Use the idealized execution model for rapid iteration while tuning your strategy. Once you're happy with the results, switch to the realistic model for final validation. The realistic model will typically show lower returns due to spread and impact costs.
Watch for overfitting
If your strategy scores well on cumulative return but poorly on walk-forward consistency, it may be overfit to the specific date range. Try different date ranges and check that performance is stable. A walk-forward consistency above 60% and a degradation ratio above 0.5 are good signs.
Use survivorship-free mode for accuracy
Enable survivorship-free mode to include delisted securities in your backtest. Without it, your universe only contains stocks that survived to the present day, which introduces a positive bias. With it enabled, positions in delisted securities are force-liquidated at the last available price.
Check capacity before scaling
The capacity estimation tells you the maximum AUM your strategy can handle. If you plan to increase capital, make sure the capacity estimate supports it. Strategies trading small-cap or low-volume names will have lower capacity.
Realism warnings matter
The backtest results include a warnings section that discloses execution model, calendar mode, shorts policy, corporate actions applied, and any partial fills. Review these to understand the assumptions behind your results.