Time Series Cross‑Validation in Python: Walk‑Forward Splits for Quants

How to Perform Cross‑Validation for Time‑Series Data in Python

Randomly shuffling data into training and test folds is a cardinal sin in quantitative finance. Time moves forward, and using a future price to predict yesterday’s return—look‑ahead bias—makes any backtest worthless. Yet many beginners reach straight for KFold without realizing they’ve contaminated their model with tomorrow’s information.

To properly evaluate a trading model, you need time‑series cross‑validation: you train only on the past and test only on the future. This post shows you exactly how to do that with Python, using expanding‑window and sliding‑window splits, and apply them to a LightGBM model.

Why Standard Cross‑Validation Fails for Time Series

Think of a standard 5‑fold split like shuffling a deck of cards before dealing. Every fold contains a random mix of early and late data points. For a stock price series, that means your model might learn from the 2022 crash and “predict” the 2020 recovery—impossible in real trading. The result is a model that looks brilliant in the lab but loses money the moment it goes live.

Time‑series cross‑validation respects the chronological order of events. It mimics the real‑world flow of data: you build a model on historical data, forecast the next period, then slide forward. This is exactly the philosophy behind walk‑forward testing, which we explored in detail in The Walk‑Forward Test: The Only Backtest That Matters .

Expanding‑Window vs Sliding‑Window: Which Split Is Right for You?

There are two main strategies for time‑series splits:

Expanding‑Window (Anchored)
You keep all past data and test on the next chunk. The training set grows over time, like a snowball rolling downhill. This is ideal when you believe older data still contains valuable patterns and you want maximum statistical power.

Sliding‑Window (Rolling)
You maintain a fixed‑size window of the most recent data, discarding old observations as new ones arrive. This is better when market regimes change rapidly and stale data can mislead the model. Think of it as trading with a short memory.

Strategy	Training Size	Best For
Expanding	Grows over time	Stable relationships, long‑term factors
Sliding	Fixed, recent window	Adaptive models, volatile regimes

We’ll implement both and see how they affect model evaluation. For a deeper discussion of choosing the right validation framework, see our comparison of VectorBT and Backtrader, which explores event‑driven vs vectorized backtesting.

Loading a Time‑Series Dataset

We’ll generate a synthetic dataset with 2,000 trading days and a few features. In practice, you’d load your data from Parquet files or a database—see our Parquet + Python guide for efficient storage.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)
dates = [datetime(2020,1,1) + timedelta(days=i) for i in range(2000)]
df = pd.DataFrame({
    'date': dates,
    'momentum_20': np.random.randn(2000) * 0.02,
    'volume_ratio': np.random.rand(2000),
    'volatility_60': np.abs(np.random.randn(2000) * 0.01),
})
# Create a target: momentum plus noise, slightly decaying over time
df['forward_return'] = (
    df['momentum_20'] * 1.1 +
    df['volume_ratio'] * -0.6 +
    np.random.randn(2000) * 0.02 +
    np.linspace(0, -0.01, 2000)  # mild alpha decay
)

Implementing Expanding‑Window Cross‑Validation

Scikit‑learn provides TimeSeriesSplit, which performs exactly an expanding‑window split. We’ll loop over the splits, train a LightGBM model, and collect out‑of‑sample predictions. (If you’re new to LightGBM, see our step‑by‑step LightGBM guide .)

from sklearn.model_selection import TimeSeriesSplit
import lightgbm as lgb

features = ['momentum_20', 'volume_ratio', 'volatility_60']
X = df[features].values
y = df['forward_return'].values

tscv = TimeSeriesSplit(n_splits=5)  # 5 expanding windows
predictions = []

for fold, (train_idx, test_idx) in enumerate(tscv.split(X)):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    model = lgb.LGBMRegressor(
        n_estimators=100,
        learning_rate=0.05,
        num_leaves=31,
        verbosity=-1
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_test, y_test)],
        eval_metric='rmse',
        callbacks=[lgb.early_stopping(10)]
    )
    preds = model.predict(X_test)
    predictions.append(pd.DataFrame({
        'date': df['date'].iloc[test_idx],
        'y_true': y_test,
        'y_pred': preds
    }))

# Combine all out‑of‑sample predictions
oos_results = pd.concat(predictions).sort_values('date')
print(f"OOS RMSE: {np.sqrt(((oos_results['y_true'] - oos_results['y_pred'])**2).mean()):.4f}")

This loop trains five models, each tested on a subsequent block of time that was never seen during training. The predictions from each test fold are stitched together to form a continuous out‑of‑sample track record.

Implementing Sliding‑Window Cross‑Validation

Sliding‑window splits aren’t built into scikit‑learn’s TimeSeriesSplit, but you can easily create them manually. We’ll use a fixed training size of 500 days, testing on the next 100 days, and slide the window by 100 days.

train_size = 500
test_size = 100
step = 100
predictions_sliding = []

for start in range(0, len(df) - train_size - test_size + 1, step):
    end_train = start + train_size
    end_test = end_train + test_size

    X_train = X[start:end_train]
    y_train = y[start:end_train]
    X_test = X[end_train:end_test]
    y_test = y[end_train:end_test]

    model = lgb.LGBMRegressor(
        n_estimators=100,
        learning_rate=0.05,
        num_leaves=31,
        verbosity=-1
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_test, y_test)],
        eval_metric='rmse',
        callbacks=[lgb.early_stopping(10)]
    )
    preds = model.predict(X_test)
    predictions_sliding.append(pd.DataFrame({
        'date': df['date'].iloc[end_train:end_test],
        'y_true': y_test,
        'y_pred': preds
    }))

oos_sliding = pd.concat(predictions_sliding).sort_values('date')
print(f"OOS RMSE (sliding): {np.sqrt(((oos_sliding['y_true'] - oos_sliding['y_pred'])**2).mean()):.4f}")

The key difference: the training window remains a constant length, dropping old data as new data arrives. This can better capture the mild alpha decay we baked into the synthetic target.

Avoiding Data Leakage: The Golden Rules

Time‑series cross‑validation is only as good as your discipline. Here are four non‑negotiable practices:

Never normalize using the full dataset. Compute mean and standard deviation only on the training fold, then apply to the test fold.
Lag all features. Ensure every predictor uses data from before the target date. A common trap is using same‑day open price to predict same‑day close—that’s leakage.
Store cross‑validation folds by date, not index. If you shuffle indices, you lose temporal order.
Don’t peek at future for feature engineering. Rolling averages, volatility calculations, and sector dummies must be computed point‑in‑time.

For a deeper treatment of data hygiene, see our article on The Metadata is the Alpha , which shows how column naming and lineage prevent silent leakage.

From Cross‑Validation to a Live Strategy

Once you’ve validated your model across multiple time windows, you have a realistic estimate of its predictive power. The next step is to embed it in a full backtesting framework—with realistic transaction costs, slippage, and position sizing. Our walk‑forward testing guide shows how to turn these predictions into a proper equity curve.

For larger datasets, pair this workflow with DuckDB and Parquet to query billions of rows in seconds, and use LightGBM’s categorical support to handle sector codes, ticker IDs, and other discrete features without preprocessing.

You now have a robust cross‑validation toolkit that respects time. Every model you evaluate with expanding or sliding windows gives you an honest reading of its future performance. Apply this before you risk a single dollar, and you’ll already be ahead of the crowd.