Measuring Incrementality with Universal (Global) Control Groups

In marketing, control groups are used to determine the best marketing treatment. They are comprised of marketing targets that either get a different version of the treatment or get no treatment.

Holdout groups are the type of marketing controls that get no treatment, usually, for a single marketing campaign. This test vs control experiment design allows for the measurement incremental impact of marketing on sales. Throughout this article, I use the terms holdout and control group interchangeably, as the article is aimed at understanding how holding out multiple treatments can help measure marketing incrementality.

Universal or global control groups are control groups that are being held out of multiple marketing campaigns. They are used to measure the cumulative impact of all of the advertising the group is excluded from. In compound experiments universal control groups are used in combination with individual campaign control groups, providing very powerful tools for sales attribution.

Universal control groups are commonly used to achieve these goals:

Measure the cumulative impact of sequential marketing communications.
Measure the cumulative impact of concurrent marketing communications.
Measure multiple location-based tests in retail.

Measuring sequential marketing communications using a universal control group

Marketers believe that advertising has effects that outlive the duration of campaigns, which is often referred to as brand equity. The cumulative effect of marketing campaigns over time means that there is a “leftover” effect from past campaigns, which adds up over time and transforms into the base level of sales.

Global control groups can be used to determine the cumulative impact of marketing campaigns run sequentially. In this case, a universal control group is a control group that is held out of all marketing communications at the start of the marketing campaign sequence.

Performance of your treatment groups vs a universal control group can inform you whether your communications have a cumulative effect or not.

If your marketing has a cumulative effect, the distance between universal control and treatment group will grow as time passes:

When there is no cumulative effect, the sales level at the end of individual campaigns should return to the same baseline level as the universal control group.

Here is the most important thing about using universal control groups for measuring sequential communications:

If you think your marketing has a cumulative effect, do not measure the impact of a single marketing campaign in the sequence using the universal control group that has been held out from other marketing campaigns.

To measure the impact of an individual campaign in the sequence, you need to create a separate control group that is only withheld from this particular communication. Why? Because your universal control group stops being representative of your treated group after it has been withheld from previous communications.

This is how you should measure the impact of an individual campaign in the sequence:

When an individual campaign is measured against universal control only, it is impossible to isolate the effect of the campaign from the leftover effects of other campaigns your treatment group has been exposed to.

The following example shows the measurement of Campaign Communication #3 against both universal control and campaign-specific control group. A Campaign-specific control group has been held out of Campaign Communication #3 only. It was subject to all prior communications and will be subject to subsequent communications.

When the results of the measurement are summarized, we can see that the respective control groups show the impact of the communications they have been held out of.

Below is the interpretation of the results produced by a compound experiment that has both universal and campaign control.

(Treated Group) – (Campaign Control) = Performance of Campaign #3 Only

(Treated Group) – (Universal Control) = Cumulative Performance of Campaigns 1-3 during Campaign 3 window

(Campaign Control) – (Universal Control) = Leftover effect of Campaigns 1-2 on the baseline during Campaign 3

While this problem does not arise when there is no cumulative effect of multiple communications, however, the best practice is not to use a universal control group to measure parts of the campaign.

Measuring the cumulative impact of concurrent marketing communications using compound control groups

This is the classic case of sales channel attribution, where you conduct an advanced experiment that correctly identifies incremental effects of each communication with and without the presence of other communications.

This powerful design can help companies not only measure the incremental impact of each channel on sales, but to optimize marketing spend across channels. A universal control group is being held off from all marketing and is used to determine baseline sales.

In this sample compound experiment, we look at the effects of three marketing vehicles: Facebook ads, email, and catalog. The results are measured through several groups that act as treatment and control groups against each other. Please note that while treatment/control split for email and catalog can be executed randomly on an individual recipient basis, Facebook ads are usually turned on and off based on geography/ISP basis.

Here are the sample results of average sales to each of the groups.

Advanced attribution for multiple media with a universal control group

We can easily determine sales lift over baseline, or universal control group, for each group:

The real power of this design is in comparison across different treatment groups, which allows us not only to conduct true lift attribution, but to make conclusions about the effectiveness of different channels in the presence of communications from other channels. This is really the gold mine for marketing optimization.

For example, this analysis of the effectiveness of Facebook ads shows an increase in sales for groups with and without Facebook ads in the presence of email, catalog, email and catalog, and no other communications.

From this analysis, we can conclude that Facebook Ads result in an incremental increase of 2 units of sales in the absence of email communication, and an increase of 1 unit of sales in the presence of email communications. Please note that determining the effect of Facebook ads in the presence of email is based on the control group that receives the email, thus the control group for any given element should never be “cleaned” of additional communications present in the marketplace other than the one being measured.

Measuring multiple location based tests in retail

We have reviewed advanced attribution and program assessment using universal control groups, yet the most common use of universal control groups is in store-based retail.

Retailers love to test. They test merchandise assortment, store layouts, visual presentation, everything. Solutions like APT Test&Learn for stores automate and simplify testing in the retail environment. It is not uncommon for a large retailer to execute up to 100 tests at the same time. Since the smallest unit for these tests is a store, it can be hard to create tests and control groups where test results do not interfere with each other.

For example, if the minimum sample size for a valid test is 20 stores, a retailer would need 800 stores to run 20 tests: for each test, there will be 20 test stores, 20 control stores, times 20. However, if we believe all tests can be measured against the same control group, we can potentially have 20×20=400 test stores and only 20 control stores, thus reducing the number of stores in the program to 420. However, if we believe that we can test women’s lingerie layout and toy merchandise assortment in the same set of stores without the results interfering with each other, we can reduce the number of stores in the program further – thus, being able to conduct more test in the company footprint.

This is why many retailers assign non-interfering tests to the same test groups, while holding out universal control groups.