Risk & Performance Metrics Advanced

OOS: Out-of-Sample Test

Also known as: out-of-sample validation, holdout test

What is it?

An out-of-sample test checks a strategy on data that it was never optimised on. The idea is simple: before you start tuning a strategy, you set aside a slice of historical data, the holdout, and you do not touch it while you build and adjust the rules. Once the strategy is finished, you run it on that held-back data for the first time. Because the strategy never learned from this period, its performance there is a much fairer indication of whether the edge is real or just fitted to the data you optimised on.

Beat Overfitting

1In-sample: 2018–2022 Build and tune the strategy only on this slice. Every rule and parameter is optimised against this data.
2Lock away the holdout Set aside 2023–2024 and never look at it while tuning. One peek during optimisation destroys the whole test.
3Out-of-sample: 2023–2024 Run the finished strategy on this unseen slice for the first time. No more changes are allowed from here on.
✓Holds up → edge is likely real Similar performance on data it never saw is the strongest pre-live sign the edge is genuine, not curve-fit.
✕Collapses → overfit red flag If it falls apart on the holdout, the strategy was fitted to the past, not to a real edge. Back to the drawing board.

Tune on the in-sample years, then test untouched on a held-out slice to expose overfitting.

It is the main defence against overfitting, where a strategy looks perfect only because it was shaped around the exact past it was tested on. For example, you might tune a strategy on data from 2018 to 2022, then test it untouched on 2023 to 2024. If it performs similarly on that unseen period, that is an encouraging sign the edge may be genuine. If it collapses, that is a red flag that the strategy was probably curve-fit.

The single most important rule is to keep the out-of-sample data truly untouched: if you peek at it and then go back and adjust the rules, it is no longer independent and its value as a check is destroyed. Even a strong out-of-sample result is not a promise. It is the best pre-live evidence you can gather, but markets change, past performance does not guarantee future results, no strategy is risk-free, and your capital is at risk.

Why it matters: It is the main defence against overfitting: a strategy that holds up on unseen data is far likelier to work live.

Trade impact: High

Out-of-sample results are the strongest pre-live signal that an edge is genuine.

Real-world example

A strategy tuned on 2018-2022 is then tested untouched on 2023-2024; similar performance is a good sign, a collapse is a red flag.

How SignalBots handles it

SignalBots emphasises out-of-sample and forward results over in-sample backtests when presenting a feed's track record. See /risk-warning.

Pro tip

Hold out a meaningful, untouched slice of data before you start optimising, and never peek at it while tuning.

Common pitfalls

Quietly using the out-of-sample period during tuning, which destroys its value as an independent check.

FAQs

Frequently asked questions

Why hold out data for testing?

So you can check the strategy on a period it never learned from. Strong out-of-sample results suggest a real edge rather than curve-fitting, though they still do not guarantee future profit.

How much data should I hold out?

A meaningful, untouched slice that includes varied conditions, often a recent chunk of the history. It needs to be large enough that good performance on it is convincing rather than luck.

What happens if I peek at the out-of-sample data?

Its value as an independent check is destroyed. Once you adjust the rules based on it, the strategy has effectively been fitted to that period too, and it can no longer prove the edge is genuine.

Does a passing out-of-sample test guarantee live profit?

No. It is the strongest pre-live evidence you can gather, but future markets can differ from any tested period. Past performance does not guarantee future results, and your capital is at risk.

How is out-of-sample testing different from forward testing?

Out-of-sample testing uses held-back historical data the strategy never saw. Forward testing runs the strategy on new data as the market unfolds, often on demo. Both check robustness on unseen conditions.

Trading involves substantial risk of loss. Historical and backtested results do not guarantee future performance. Read the full risk warning.