Control Group Dos and Donts
These are observations I have made during my years of working with controlled experiments. They are not exhaustive, but a good list to start with.
Myths about controlled experiments:
These things are often perceived as threats, but they are not. You should stop trying to “fix” them.
- Not believing in the power of random assignment. Some people don’t believe that random assignment of subjects into groups can make them representative. Even when random unbiased assignment is entirely possible, analysts try to come up with complicated algorithms for assignments, such as stratified samples. For tests of decent sample sizes (1,000+ subjects in each group), it is completely unnecessary. Don’t believe me? Conduct a simple experiment: assign subjects randomly into two groups, and then check if they are representative in both profile and future behavior.
- Thinking that all other marketing communications have to be halted for the duration of the test to obtain a valid measurement. SO NOT TRUE! This is the most common objection to the results of the test, and it is wrong precisely because the controlled experiment is designed to handle just that. The point of the control group is to control for everything else that is going on in the market. Everything. A hurricane, a new media campaign, another mail piece, paid search ramp up. Everything. As long as this “everything” applies to both test and control groups equally, your measurement is valid.Sometimes I see people getting one result from the test in one business environment, and a different result in another. They claim that in the one case, the measurement is not valid because of a different environment. Nonetheless, the correct explanation is that your measurement is still valid, but the program delivers different results depending on the environment.
The point of the control group is to control for everything else that is going on in the market.
- Believing that you need special conditions to conduct the controlled test, like “resting” the groups before the test or having a universal control group. In most cases, you don’t. Since you are trying to measure the effectiveness of the treatment as it is implemented on the market under BAU conditions, your control should be subject to BAU conditions.
- Thinking that you can’t perform two (or more) controlled tests on roughly the same population at the same time. This question often arises when control groups are implemented as business as usual on all campaigns on an ongoing basis. Often, you have either calendar shifts or other changes that land two campaigns in the same or similar (i.e. overlapped) population at the same time. Now, all measurement is suddenly declared invalid. However, as long as you have random and representative group assignments, your results are fine. Granted, you are measuring your programs exactly in the conditions they were executed, i.e. how effective is our program if we run it together with this other program, but it is still a valid measurement because both test and control groups are equally impacted by the other program. If you find that one of the programs was not working, but you strongly believe that it would have worked if the timing was different, it is not because the measurement is invalid, it’s because overlapping your campaign with another campaign renders it ineffective.
- Thinking that customers who may be assigned into both groups simultaneously due to accessing the test through multiple devices will invalidate your test. This one is a little trickier, but the bottom line is that customers in both groups will be impacted by this design flaw, and the setup will produce a valid measurement for the original assignment, and possibly beyond. To make your test a bit more robust, my recommendation is to have a 50/50 split, so the proportion of “dual group” customers is the same for both test and control.
Donts of controlled experiments:
Overview of the most common ways to screw up your controlled experiment design.
- Assigning customers eligible for the treatment into the test group, and those not eligible for the treatment into the control group. Why it is going to screw you up every time: being eligible for the treatment makes your test group different from control group, and thus not representative. Unless you have no operational ability or legal power to assign like customers in both groups, never use this method.
- Freaking out about the test, and sending more marketing specifically to the hold out groups. I have seen marketing people reach this conclusion once they hear about a long term marketing suppression test. We can’t just leave these people alone! Not only we can, we should. That’s the whole point! In fact, it is beneficial for them to leave these people alone so their marketing program generates a positive difference in the no marketing group.
- Biased measurement of the outcome of the test. In other words, measuring the groups in a different way. The most common mistake is to find a flaw in the execution of the treatment and try to “adjust” the outcome of the test group by it. For example, if the treatment has not reached every customer, some try to only count “reachables”. The problem? The control group is still comprised of both “reachables” and “unreachables”! if your groups are representative to start with, measure them in exact same way.
- Creating a bunch of experiments where control groups are different percentage in each test – and then lumping them all together for the analysis. Here is the post with detailed analysis of this flaw.
The controlled experiment design is a very reliable analytical method, and if implemented correctly from the very beginning, it is extremely resilient to all sorts of disturbances and interventions. In my practice, I have seen every attack on the design, and when experiment was repeated, the results always held up.