1 Introduction

This is a course in Modern Experimental Design. So what is experimental design, and why does it need to be modern?

Experimental design is often a neglected part of the curriculum for students in statistics and data science. The basics of the field were developed from the 1920s to the 1960s, so it is not new and exciting anymore; rather than being taught as an area of active statistical research, it is taught as a service course for psychologists, biologists, medical students, and others in the sciences who might actually conduct experiments of their own.

There the role of experimentation is clear. To determine whether this new treatment causes a reduction in tumor size, or whether providing cookies on the last day of class causes an increase in student ratings of professors, we need an experiment. After all, statisticians can repeat “correlation is not causation” better than anyone, and an experiment offers the best opportunity to prove a causal relationship.

That implies we’ll need to talk about causality to really understand experimental design, and so we will. We will apply causal principles to determine what an experiment can prove, and to choose the right experiment to test our research question.

Then consider the analysis of experimental data. To students who have taken a linear regression course (like mine), analyzing data from an experiment seems trivial: Just do a regression. Let \(Y\) be the response variable of interest and let \(X\) be the treatment variables we want to learn about, and interpret the coefficients as you would in any other regression. Indeed, many experimental design textbooks are full of this kind of analysis, except they usually call it ANOVA and present complicated tables of sums of squares, even though the model is ultimately ordinary linear regression.

As statisticians, we’re used regression problems like this, and know all the usual theory: ordinary least squares, its role as a minimum variance unbiased estimator, and so on. We know we can use different estimators, like penalized regression, if we need different statistical properties.

But this is experimental design. We get to design experiments. Rather than analyzing data that someone else has collected, we get to choose what data to collect and how to go about collecting it. And that is where the interesting work of experimental design lies. An experiment allows us to choose who gets which treatments and determine how to measure the response. Choosing who gets which treatments allows us to answer causal questions, yes. But less obviously, designing an experiment lets us choose its statistical properties before we collect the data, and that’s all about practical constraints, not just causality.

Traditional experimental design methods were developed from the 1920s to the 1960s or so, and focused on the kinds of experiments conducted in agriculture and industry. These experiments featured certain constraints:

Expensive data. Each observation represents the yield of an entire field of grain, or the strength of a batch of steel, or the purity of a chemical plant’s output. Collecting the data may take months or thousands of dollars.
Small to moderate sample sizes. Because the data is expensive, we want to get the most possible information from the fewest possible samples.
Moderate effect sizes. Small effects (i.e. a 1% change in grain yield) are not important, so experiments look for bigger effects.
Control of some, but not all, variables. An agricultural experiment might let experimenters choose how much fertilizer and pesticide each field gets, but we cannot control the weather.

These constraints reward experiments chosen to extract the most information from the least data, working around restrictions imposed by the physical setup. As we will see, there is much we can do as designers to choose treatment assignments that maximize the information we can gain, and there are many standard designs. These experiments still exist, of course, and are widely done in industry.

On the other hand, there are also what I’ll call “modern” experiments. Often these are experiments conducted on online platforms at massive scale, featuring different constraints:

Cheap data. Each observation might be one user on a website, and the website may have millions of users per day. Experimenting on them (e.g., to determine if their behavior changes with a different web page layout) is a matter of writing some code.
Ridiculous sample sizes. A simple experiment may have millions of observations in just a few days of operation.
Small effect sizes. On an online service with millions of users, a few percent change in behavior might represent many millions of dollars in revenue. A data scientist might earn their entire annual salary through one experiment that increases revenues by 1%.
Many simultaneous experiments. Companies like Google, Meta, and Amazon are constantly experimenting with their websites, and each user might simultaneously be in several different experiments (entirely without their knowledge).
Learning effects. Exposing users to an experiment may change their behavior. Change how you display ads and, temporarily, users may click on more ads; but after a month, they will have learned to ignore the ads just as well as they used to.

Of course, there are many other kinds of experiments than industrial experiments and online experiments. There are experiments in medicine (clinical trials), in psychology, and in all branches of science; there are even policy experiments designed to inform public policy and legislation. They all have unique constraints, and sometimes very different statistical goals.

To do experimental design, then, is to understand the question to be answered well enough that you can choose the data to collect that will best answer it, and know the appropriate statistical analysis to reach the scientific question.