Today I’m going to give a recent example of why you shouldn’t stop your tests too early.
In a previous blog entry I discussed some general rules for how long you should run your tests. Not stopping your test too early is important in getting you accurate results.
Good general rules are to run tests for at least 2 weeks and at least long enough to achieve confidence in your data. Stopping your tests before these two rules can risk false positives. There are other factors for how to decide on a test winner, but these two can apply for every site I’ve worked with.
So, here’s an example of why we follow the two guidelines. Here’s the graph of the test’s conversion rate at 4 days. The green line is our test recipe and the red line is our control. This is showing overall, average conversion rate over time.
That’s a 156% lift in conversion rate with full confidence! This test was a test where this improvement would have affected 100% of their sales too. This is huge!
Here’s the test at a full two weeks. This is a 32% lift. This is still a huge improvement considering it affects all traffic on the site! We are also still at 99.7% significance, which is confidence. However, the projected impact has significantly been reduced.
Here’s the graph of what happened between the 4th and 14th day. No change. There’s no impact on conversion rate. This starts to raise questions. Why isn’t this test still delivering? What caused the jump over the first 4 days? The more questions you have, the less confident you are about the change.
Follow the two general rules while testing to help you remove the possibility of false positives and give you a more accurate account of what the expected result will be.
If you hardcoded the test at the 5 day point of testing, you’d be expecting an astronomical impact on your overall sales as soon as the change was made. Not only that, but you then start to assume that your customers liked this change and you start to make adjustments to your site with this conclusion in mind. This is a good example of a false positive: the test data showed a positive response, but there really was no effect.
The reality is that this change may not have done anything for customers and your follow up tests would all be testing items that have no affect on your visitor’s decision to buy. Now we have to figure out why there was an initial bump and which segments of your traffic this test did/didn’t resonate with. Try to view the questions raised during the second week as an opportunity for you to learn more about your customers not as a problem with the test or data itself. The more you learn about your customer behavior, the more targeted you can make your next tests.