Control (Holdout) Group Sample Size

TL/DR: for most marketing experiments, I recommend having 10,000 targets in your smallest group.

So, you have decided to measure your marketing campaign against a control or holdout group. 

Congratulations! You are on the right path.

Now you need to figure out how many targets you need for the test, particularly, the number of targets in the test and control groups. 

The goal is to produce statistically significant results. 

I am going to define “results” as the difference in a percent metric such as purchase or conversion rate between your test and control groups. This type of result implies that your response is binary, i.e. either a customer responds or does not and that you are evaluating the difference in the percent response.

Terms

Test (treatment, marketed, campaign) group – a group of targets that is being exposed to a specific version of the marketing campaign.

Control group – a group of representative targets that is exposed to a different version of the marketing campaign. Control group is usually randomly chosen from all campaign eligible targets.

Holdout group is a type of control group that is withheld from marketing treatment altogether.

Response rate – a collective term for the types of outcomes you will be measuring and comparing, such as purchase rate or conversion rate. The rate refers to the percentage of the group that completed the action. The metrics used to measure outcomes should be applicable to both groups.

Effect size (incremental lift, uplift) – the difference in the response rate between test and control groups, expressed as percentage points.

For example, if your control group had a 3% conversion rate and the treated group had a 5% conversion rate, then 3% and 5% are response rates, and the effect size was 2 percent points (pp).

Let’s get into it.

Factors that impact your sample size

  1. The expected response rate in your groups. For most marketing campaigns, smaller response rate need smaller groups.
  2. The effect size you need to justify the marketing expense. To determine this parameter, you need to figure out potential profit from additional sales, the cost of your marketing campaign (including measurement and creating additional versions), and the breakeven point in additional sales where your marketing campaign pays for itself. The larger the effect size you need, the easier it is to prove, thus requiring a smaller control group.
  3. Approximately, what percent of the total targets is going to be your control group. I provide numbers for 50/50 and 90/10 splits, both of which are very common. 50/50 split is often used when trying to minimize the total experiment sample size, and 90/10 split is used when trying to minimize one of the groups.

Have those ready? 

Put on your thinking hat; there are a lot of numbers involved.

Control Group Sample Size Calculation

There are many assumptions we can use to calculate group sizes, and I am going to group them into two approaches:

  • Conservative assumptions, resulting in larger group sizes.
  • Aggressive assumptions, when you need to minimize one of the groups.

Because the precision of the outcome is driven by the smallest group, I will calculate that small group size below. The conservative estimate gives me the larger smallest group size, and the aggressive estimate is the bare-bones minimum you need.

Conservative estimate

Assumptions:

  • Confidence level of 95%. That means that in 5% of cases, our significant calculation turns out to be invalid, i.e. there is no difference.
  • Power is 80%. That means we have 20% likelihood of not determining the difference if it exists.
  • 50/50 test-control split.

This table shows the sample size you need in each of the two groups:

Effect→
Response↓
0.10 pp0.25 pp0.50pp1.0 pp1.5 pp2.0 pp3.0 pp
0.5%85,86315,6004,6741,554861580342
1%163,09627,9387,7512,3191,200769425
2%315,20752,23813,8103,8261,8661,142589
3%464,17976,03619,7445,3022,5181,507750
5%752,704122,12531,2358,1593,7812,2131,060
10%1,419,074228,55557,76414,7526,6943,8421,775
20%2,516,347403,742101,40425,58311,4736,5112,944
50% control group sample size estimates

Interpretation. If we expect our control response rate to be 3%, and we need to achieve a 4% response rate in the treatment group to justify additional costs (1 pp effect), then we will need to have 5,302 targets in each group or 10,604 targets total.

Aggressive Estimate

Assumptions:

  • Confidence level of 95%. That means that in 5% of cases, our significant calculation turns out to be invalid, i.e. there is no difference.
  • We don’t care about power, whether we miss a real difference or not. In statistical terms, it means the power is 50%.
  • 90/10 test-control split.

This table shows the sample size you need in your smaller (10%) group:

Effect→
Response↓
0.10 pp0.25 pp0.50 pp1.0 pp1.5 pp2.0 pp3.0 pp
0.5%25,0444,9201,610592347242149
1%46,0368,2722,445799438293171
2%87,37814,8724,0891,207618393215
3%127,86721,3365,6991,606794492258
5%206,28433,8548,8162,3801,135682341
10%387,38862,75916,0134,1641,9221,121533
20%685,570110,32527,8447,0923,2101,838845
10% control group sample size estimates

Interpretation. If we expect our “no marketing” response rate to be 3%, and we need to achieve a 4% response rate in the marketed group to justify the costs (1 pp effect), then we will need to have 1,606 targets in a 10% control group, or 16,060 targets total.

But wait, that’s not all!

Important Real Life Considerations

Imagine you have run the campaign, calculated the results, created the report, and now you are presenting it.

What is the most likely question your management is going to ask?

I’ve been through this many times, so I can tell you. It’s “were there any segments where our campaign worked better?” This is a particularly popular question when the campaign did not do well enough to pay for itself.

How can you answer this question?

You have the profile variables associated with your targets to segment them on, e.g. region, product composition, tenure, demographics, etc.

Your test and control groups are representative of each other, and thus they have the same percentage targets in each segment. (If yours do not, you must read this!)

So, you should segment both the test and control groups, and then compare their response rates by the segment. Easy enough.

Wait! 

Your segments are now smaller, and you don’t have enough sample size to tell if the differences are statistically significant or not!

Yes, these West region targets did really well, but there are only 350 control targets in the West, so your results are directional at best.

To avoid this sorry situation, you have to think it through upfront. What are the splits your organization will be interested to see when you get the results? What percentage of targets are they, roughly?

At the end of the day, you would need a larger sample size. If you can, run your calculations, and increase your sizing by 3-4x. Based on my experience, it usually works well.

This is where the 10,000 smallest group estimate comes from. For many marketing campaigns, the response rate is between 1%-4%, and the incremental lift needed is 0.5-1 pp. Based on my experience in B2C marketing, 10,000 group size works well for overall assessment and for reasonable segment splits.

What to Do When You Don’t Have Enough Targets

What if you have 3,000 eligible targets for the whole campaign, test and control combined? That’s not an ideal situation, and there are limited ways around it. You should not give up, though.

  • Split 50/50. 50/50 split gives you the highest precision for the total sample size. That’s the best you can do. 
  • Do a quick calculation of whether the split is even worth doing. Your time and energy are worth money, and if you are expected to find 10 additional sales, each worth $10 in profit, it makes sense to not measure smaller programs against control.
  • However, if you are running a test that can be scaled to a much wider audience, then the math changes. Do a 50/50 test-control split, and if need to, go with directional results.
  • See if you can find comparable targets outside of the campaign coverage. This is the last-ditch effort, so tread carefully. The control targets you gain might not be representative. Read this article on how to use synthetic controls.

Control Group Size for Ongoing Campaigns

You calculated your control group once, twice, thrice, but when it comes to ongoing business-as-usual campaigns, it makes little sense to create individualized control groups.

First, let me assure you that controls should be used to measure ongoing campaigns. After all, that’s where the most money is spent.

Second, if you have a business large enough to have 100K targets in each regular campaign, you should switch to a consistent percentage as a control group. 

10% control group is very popular. For a very large business, you may choose to go with a 5% control. 

The advantages of having a consistent percentage are numerous:

  • It’s easy to implement. You hold out 10% across the board, and it becomes a second nature. There is a lower likelihood of mistakes.
  • You can easily summarize the results of multiple campaigns by just summing their treatment and control groups. 
  • You always know what control group size is going to be for each program, so it is easy to validate the data.

When I set up a direct mail measurement and reporting process for one of the largest US telecommunications providers, I used a 10% control group size for all business-as-usual campaigns with great success.

How to Use Internet Sample Size Calculators

If you want to tweak the assumptions and get your own sample size estimate, here is a decent sample size calculator I found online.

Our test vs control design is called a two independent study groups test, and the response rate is a dichotomous (binary) measure. The continuous endpoint test should be used to compare continuous outcomes such as average revenue.

You would need to put in your expected response rate for the control group and the response you need for the treatment group to justify the expense, e.g. 3% and 5%. Use enrollment ratio of 1 for 50/50 split and 9 for 90/10 split.

Power of 80% can be used for a conservative estimate, and 50% for a more aggressive estimate.

Conclusion

Control groups are awesome, and you definitely should use them to measure marketing campaigns. However, determining control group size can be tricky. 

Factors, such as expected response rates and breakeven incremental lift impact the size of the control group that is needed to measure the impact of the programs. 

When designing the experiment, we should think about the likely uses of the analysis. Since segmenting the results is very common, increasing the calculated sample size by 3-4 fold is recommended.