We very often get the question “how do I know that my strategy works?” and there is actually an “easy” answer to that. It all boils down to one question, and that is, what is a sufficient sample size to assess the performance of a strategy?
However, as in trading, we are treading in the realm of probabilities, the only mathematically correct answer is, that there is no sufficient sample size. Why? Because probabilities will come closer and closer to the “actual” value, as more and more samples are played out, however, they will never reach that actual value.
But, mathematically correct does not always mean practical. We have to make assumptions based on probabilities not only when we take trades but also when we assess our performance in the markets. As we are all mortal, impatient, and hungry, we cannot afford to take a trading strategy and trade it for 1,000 trades only to find out that it is losing money. Time is of the essence.
Every time we make an adjustment to our strategy, as well, we cannot play it out for 1,000 trades to see what those adjustments brought us. Heck, not even 100 trades are realistic if you are not a very active day trader.
Additionally, the higher your win rate, the lower your variance/standard deviation, and thus you would need a smaller sample size, but how can you know your real win rate before playing out lots of trades?
You see the dilemma? Everyone has it. We need a practical approach to this problem. Basically, we are looking for the minimum viable sample size that can give us the maximum information possible for us to act on. We need actionable results, not absolute results, as there is no perfection in trading.
Of course, you can also backtest your strategy and get results faster, whereas you have to be careful with that because backtesting never resembles forward testing due to psychological and other factors. Apart from that you could also program your strategy and backtest it on years of data in a few minutes, but that is incredibly difficult, for firstly you have to be a good programmer, and secondly you would need to have a 100% non-discretionary strategy which is even harder to come by.
So what I do is this: I look at trades in batches of 50. Let’s assume I start trading a new strategy. First, I write down the hard rules I can NOT deviate from for 50 trades. Then, I trade. And I enter all the trades into my trading journal. After 50 trades, I filter out all trades where I did not stick to my rules and then get a realistic, objective picture of the strategy’s performance.
Why do I do that? Because when assessing the performance of your strategy, you have to check two things.
- How did my strategy apply to the market? How well did it work given the market conditions during the observed frame of time?
- How well did I apply my strategy to the market conditions during that time
Only then can you draw objective conclusions. So once that is done, you now have a batch of roughly 40-45 trades where you followed the rules and you know whether the market conditions were favorable or not to your strategy.
Let’s say, that batch was profitable. You then compare this batch and its key performance indicators like average RRR, win rate, and average return per trade, to your backtest data. How does it compare? Are there any significant differences, and if yes, why? If there are no significant differences to let’s say the last 2 years of backtested data, then good, go ahead. If yes, it is very likely due to the fact that you messed up the trading process, or that you overlooked something in the backtest like factoring in spread/commissions, trading times, or any other factor. Even the smallest thing can have HUGE impacts on your performance.
Once you figured out what is going on and made adjustments accordingly, you trade another batch of 50 trades with the same rules. After those 50 trades, you compare them to the 50 trades right before that, to see what your adjustments did, and you compare it to the whole performance of your strategy as well.
If we have 300 trades, and we take another 50 trades, we then compare those 50 trades to the 50 trades right before that, and to all the 200 trades before that in a whole as well. That is called using an out sample. That way you can get a grip on how your trading performance compares to recent trading activities, and to all historical trading activities. Doing that is very important, as the markets are always changing and our performance too, as we are not machines. So we have to know whether our most recent performance was way out of line or not and also whether any adjustments we made had the impact we wanted.
If our batch of 50 trades was not profitable, we have to calculate whether our out sample is mathematically still inside realistic standard deviations of our overall performance.
To make a long story short and to not overcomplicate it, we do this: before trading a new strategy, we backtest it for 200 trades up until the point where we start trading. We want the most recent market conditions to be used in our backtest. Then we trade 50 trades in a forward test. We check whether we applied our rules correctly, and see if there are any deviations from the backtest. If yes, we find out why. If not, we keep on trading. And then we always compare the next 50 trades to the 50 trades right before that, and all the trades we took right before that.
Then, we calculate the standard deviation for the whole data set and compare that to our out sample (our most recent 50 trades).
I calculate the standard deviation for Risk:Reward Ratio and %-Gain per trade, and then get an expectation corridor of where my results should be.
To see how to calculate the std. dev., please read this very comprehensive and easy to follow article.
I have two hard rules for trading stops, as well.
- If I lose more than 20% of my account from the last peak, I stop trading, no matter at what point I am in the recent batch of trades, and figure out what is going on.
- If I analyze a batch of 50 trades and it is outside of 1 standard deviation, positive or negative, I stop trading until I figure out why.
I hope this article made sense to you. As traders, we are dealing with uncertainty every day, and getting mathematical and statistical backup and affirmation by using a trading journal and simple calculations like standard deviation can help us a lot in moving forward.
Our sample size does not have to be gigantic, it simply has to deliver actionable results to us, and to me, that easily happens after 50 trades, some people even go as low as 30 trades for comparing batches. Important is that you always treat your most recent batch of trades as an out sample, and do not include it in the overall performance when comparing it. And of course, the bigger your database, the more reliable your results.
Don’t panic if your recent batch of 50 trades was a loser – calculate standard deviations, see if there is anything abnormal going on, and if not, keep going.
To summarize, there are 4 steps to assess whether your strategy is (still) profitable or not:
- Assess how well your strategy applied to market conditions in the observed timeframe
- Assess how well you applied your rules/strategy to the market in the observed timeframe
- Assess how your most recent batch of 30-50 trades (=out sample) compares to the 30-50 trades just before that, and how it compares to the standard deviation of your overall performance.
- Found any outliers? Find out why, and stop trading until then.
Let me know if you have any questions! Thank you.