Experimental design in practice: adjustments to the measurement

This is a real story that I absolutely love to tell. It highlights the right and the wrong way to deal with experiment measurement. Let’s dive in.

A while ago I was working on a direct mail program. The underlying database still had some kinks in it, just like many other databases we have to work with. One of the shortcomings was the fact that we had a bunch of junk among the mailing addresses. Some regions more than others, and one of the regions was battling a 15% junk rate – only 85% of their list were successfully getting through the mail house cleansing process. In other words, 15% of our target group was not even getting the mail!

The standard measurement for programs at the time was the randomized control group. The suggestion was made to adjust the mail group result by bouncing the original mail group against the cleansed list and measuring the response rate only from mailed addresses. Makes sense?

No, it actually does not. It can be reasonably expected that the junk address targets are different from the rest of the mail group – for once, they are less likely to exist. Cleansing the mail list is equivalent to dropping off the losers, but when you compare it to the control group, it still has those losers in it! In other words, your control group is no longer representative of the mail group, which the violation of the number one rule of experiment design.

There were two options: either both groups needed to be cleansed in exact same way, or neither of them. The first option is self-explanatory, however, the second needs a bit of exploring. Due to large sample size and random selection, we are going to have a very similar percentage of junk addresses in both groups.

While cleansing does change the number of mail pieces we are able to send, we can easily adjust for it using the cost of the program and the lift required to have a positive ROI. In other words, with the right experiment design, this cleansing issue is no big deal at all. And it is certainly less work than creating a cleansing feedback loop from the mail house back to our database just for the sake of getting the satisfaction of only measuring the perfectly cleansed groups.

My vote is for less work!

Leave a Reply