3 Experimental Setup

Before we study experimental design in detail, we need some basic setup and terminology that will help us discuss experiments, designs, and their goals.

An experiment, broadly speaking, allows us to assign treatments to experimental units, then observe their response to those treatments. This is in contrast to an observational study, in which treatments are not controlled by an experimenter—the data analyst gets whichever data happens to be available. Let’s consider the parts in turn.

3.1 Experimental unit

The basic component of any experiment is the experimental unit. The experimental unit is the smallest unit that can be assigned a distinct treatment. For example:

In a medical experiment that gives each patient a different treatment, the patients are the experimental units.
In a cooking experiment that tries different recipes to make chocolate chip cookies, each batch of cookies (made with a particular recipe) is an experimental unit.
When a tech company wants to know which color of button is more likely to make users click on it, each user who is shown a button is an experimental unit.

Notice that experimental units can each be assigned a different treatment. In the cookie experiment, for instance, we make a batch of cookie dough that might make several dozen cookies. The batch of dough is the unit, because it is produced with a single recipe—the cookies are not individual experimental units.

We usually number the experimental units from 1 to \(n\), the total number of experimental units. Often units are indexed using \(i \in \{1, 2, \dots, n\}\).

3.2 Treatments and treatment factors

In an experiment, we manipulate one or more treatments to observe their effect on a chosen response.

In the basic case, there are two treatments we want to compare:

In the medical experiment, we either give patients the new drug (treatment) or an old drug already used to treat this condition (the control).
In the cooking experiment, we either try recipe A or recipe B.
In the tech experiment, we either try a blue button or a red button.

This is called a binary treatment, because we can denote the two options as either 0 or 1. The options are called levels. In medical experiments, the levels are often called treatment arms, so an experiment with a treatment and a control group has two treatment arms.

In general, there may be more than two treatment levels: there might be four different drugs, ten different recipes, or 100 different shades of button. The treatments might also be continuous rather than discrete: we could give patients anywhere between 0 and 100 mg of the medication, or try baking the cookies for anywhere between 7 and 14 minutes.

In some experiments, we may be able to manipulate several different treatments at the same time, so each patient gets a combination of treatments:

In the medical experiment, patients can get the new drug or the old drug, and each is offered at three different doses, making \(2 \times 3 = 6\) possible treatment combinations.
In the cooking experiment, we can try three different amounts of sugar and three different types of chocolate chip (semisweet, dark, or milk chocolate), making \(3 \times 3 = 9\) different treatment combinations.
In the tech experiment, we can change the color of the button and the color of the text within the button; the number of combinations is the product of the number of button colors and the number of text colors.

We often refer to these as treatment factors. Each experiment above has two factors—two different treatment parameters that can be changed by the experimenter. (Note that these are not factors in the sense of R’s factor(); they can be discrete or continuous.) Simple experiments have only one factor, but it’s common for experiments to have many factors at once.

Notation for treatments varies. In simple experiments with one binary factor, we may let \(Z_i\) represent the treatment assignment of unit \(i\), so \(Z_i \in \{0, 1\}\). In complex multifactor experiments, we may refer to \(X_i\) as being a vector of treatment assignments.

3.3 Responses

After assigning treatment to each experimental unit, we observe a response. It’s up to the experimenter to determine what response variable to observe, and the response can in principle be any kind of data: a binary variable, a categorical variable, a scalar, a vector, or any combination. In most common experiments, the response is a scalar, denoted \(Y_i\) for unit \(i\):

In the medical experiment, we might measure each patient’s blood cholesterol level after the treatment, to see which treatment reduced it the most.
In the cooking experiment, we could have a panel of tasters rate each batch of cookies from 1 to 10.
In the tech experiment, we can record the fraction of website users who clicked each button.

Selecting the right response is crucial. Obviously the response should be something we care about, but it must also be something we can practically measure. For example, cholesterol-lowering drugs are prescribed because it’s believed that lowering cholesterol reduces the risk of heart disease, so the ideal response would be whether or not each patient develops heart disease—but waiting years for heart disease to emerge is much harder than performing a simple blood cholesterol test. So we settle for the cheaper, simpler measure, and hope that the drug that best lowers cholesterol will also best reduce the risk of heart disease.

Responses must also be measurable with precision. If our cookie raters are fickle and give widely varied scores to the same batch of cookies, it’ll be hard to tell which batch is truly better; whereas if they are very consistent about what they want in cookies, it will be much easier. This is reflected in the variance of \(Y_i\), and we will see in Chapter 8 that \(\var(Y_i)\) will determine the power of our experiments. Sometimes an experiment can be dramatically improved by finding a better way to measure \(Y_i\), even without changing the rest of the setup.

3.4 Covariates

Besides the treatment, we may have other information about the experimental units:

In the medical experiment, each patient has an age, gender, blood pressure, and dozens of other test results and measurements.
In the tech experiment, each user might be a different age, use a different number of your company’s other products, and have a different disposable income.

We summarize this information with covariates. These covariates are outside our control: we cannot change the age of our patients or assign our users to be a different gender. But we can use the covariates when deciding how to assign treatments, and we can use them to aid in our analysis of the response. We might expect these covariates to be associated with the response in various ways.

The covariates for unit \(i\) are usually denoted \(X_i\), just like covariates in regression.

3.5 Experiments

An experiment, then, consists of an experimenter gathering one or more experimental units, assigning them to treatments, and observing the responses. It is up to the experimenter to determine how many experimental units are necessary, how to assign them to treatments, and how to analyze the resulting data, and each of those steps will be a subject for discussion in this course.

Experiments may have different goals:

Estimating treatment effects. In many scientific experiments, we want to estimate the causal effect of the treatment on the response. In the medical experiment, for instance, we’d like to know the effect of the new treatment on the response, and how it compare to the effect of the old treatment.
Optimizing the response. In some cases, we may only want to find the optimal combination of treatments, where “optimal” means achieving the highest or lowest value of the response. In the cookie experiment, for instance, the response might be the rating of a panel of cookie judges, and we’d like to find the combination of treatments (recipe options) that produces the highest rating. Understanding the causal effect of, say, chocolate chip type on rating is secondary; getting the best possible cookie is primary.

Most standard experimental designs focus on estimating treatment effects, but as we will see later, there is plenty of work on optimizing responses.

3.6 Exercises

Exercise 3.1 (An educational experiment) A large school district wants to experiment with a new teaching method, Active Constructivist Instructional Design (ACID), by implementing it in high school math courses. The school district has 5 high schools with 52 math teachers in total, teaching 273 sections of math in total to 8,279 students.

To conduct the experiment, the school district trains their teachers to teach with ACID, then randomly selects 120 sections of math courses to be taught in the ACID way, with the rest being taught as usual. The teachers receive no extra pay to use ACID. The school district then records scores on standardized math tests at the end of the year and compares scores between the ACID sections and the rest.

What are the experimental units, and how many units are there?
What are the treatments, and how many are there?
What is the response variable?
Suggest three covariates the researchers may want to record.