3  Experimental Setup

Before we study experimental design in detail, we need some basic setup and terminology that will help us discuss experiments, designs, and their goals.

An experiment, broadly speaking, allows us to assign treatments to experimental units, then observe their response to those treatments. This is in contrast to an observational study, in which treatments are not controlled by an experimenter—the data analyst gets whichever data happens to be available. Let’s consider the parts in turn.

3.1 Experimental unit

The basic component of any experiment is the experimental unit. The experimental unit is the smallest unit that can be assigned a distinct treatment. For example:

  • In a medical experiment that gives each patient a different treatment, the patients are the experimental units.
  • In a cooking experiment that tries different recipes to make chocolate chip cookies, each batch of cookies (made with a particular recipe) is an experimental unit.
  • When a tech company wants to know which color of button is more likely to make users click on it, each user who is shown a button is an experimental unit.

Notice that experimental units can each be assigned a different treatment. In the cookie experiment, for instance, we make a batch of cookie dough that might make several dozen cookies. The batch of dough is the unit, because it is produced with a single recipe—the cookies are not individual experimental units.

We usually number the experimental units from 1 to \(n\), the total number of experimental units. Often units are indexed using \(i \in \{1, 2, \dots, n\}\).

3.2 Treatments and treatment factors

In an experiment, we manipulate one or more treatments to observe their effect on a chosen response.

In the basic case, there are two treatments we want to compare:

  • In the medical experiment, we either give patients the new drug (treatment) or an old drug already used to treat this condition (the control).
  • In the cooking experiment, we either try recipe A or recipe B.
  • In the tech experiment, we either try a blue button or a red button.

This is called a binary treatment, because we can denote the two options as either 0 or 1. The options are called levels. In medical experiments, the levels are often called treatment arms, so an experiment with a treatment and a control group has two treatment arms.

In general, there may be more than two treatment levels: there might be four different drugs, ten different recipes, or 100 different shades of button. The treatments might also be continuous rather than discrete: we could give patients anywhere between 0 and 100 mg of the medication, or try baking the cookies for anywhere between 7 and 14 minutes.

In some experiments, we may be able to manipulate several different treatments at the same time, so each patient gets a combination of treatments:

  • In the medical experiment, patients can get the new drug or the old drug, and each is offered at three different doses, making \(2 \times 3 = 6\) possible treatment combinations.
  • In the cooking experiment, we can try three different amounts of sugar and three different types of chocolate chip (semisweet, dark, or milk chocolate), making \(3 \times 3 = 9\) different treatment combinations.
  • In the tech experiment, we can change the color of the button and the color of the text within the button; the number of combinations is the product of the number of button colors and the number of text colors.

We often refer to these as treatment factors. Each experiment above has two factors—two different treatment parameters that can be changed by the experimenter. (Note that these are not factors in the sense of R’s factor(); they can be discrete or continuous.) Simple experiments have only one factor, but it’s common for experiments to have many factors at once.

Notation for treatments varies. In simple experiments with one binary factor, we may let \(Z_i\) represent the treatment assignment of unit \(i\), so \(Z_i \in \{0, 1\}\). In complex multifactor experiments, we may refer to \(X_i\) as being a vector of treatment assignments.

3.3 Responses

After assigning treatment to each experimental unit, we observe a response. It’s up to the experimenter to determine what response variable to observe, and the response can in principle be any kind of data: a binary variable, a categorical variable, a scalar, a vector, or any combination. In most common experiments, the response is a scalar, denoted \(Y_i\) for unit \(i\).

3.4 Covariates

Besides the treatment, we may have other information about the experimental units:

  • In the medical experiment, each patient has an age, gender, blood pressure, and dozens of other test results and measurements.
  • In the tech experiment, each user might be a different age, use a different number of your company’s other products, and have a different disposable income.

We summarize this information with covariates. These covariates are outside our control: we cannot change the age of our patients or assign our users to be a different gender. But we can use the covariates when deciding how to assign treatments, and we can use them to aid in our analysis of the response. We might expect these covariates to be associated with the response in various ways.

The covariates for unit \(i\) are usually denoted \(X_i\), just like covariates in regression.

3.5 Experiments

An experiment, then, consists of an experimenter gathering one or more experimental units, assigning them to treatments, and observing the responses. It is up to the experimenter to determine how many experimental units are necessary, how to assign them to treatments, and how to analyze the resulting data, and each of those steps will be a subject for discussion in this course.

Experiments may have different goals:

  1. Estimating treatment effects. In many scientific experiments, we want to estimate the causal effect of the treatment on the response. In the medical experiment, for instance, we’d like to know the effect of the new treatment on the response, and how it compare to the effect of the old treatment.
  2. Optimizing the response. In some cases, we may only want to find the optimal combination of treatments, where “optimal” means achieving the highest or lowest value of the response. In the cookie experiment, for instance, the response might be the rating of a panel of cookie judges, and we’d like to find the combination of treatments (recipe options) that produces the highest rating. Understanding the causal effect of, say, chocolate chip type on rating is secondary; getting the best possible cookie is primary.

Most standard experimental designs focus on estimating treatment effects, but as we will see later, there is plenty of work on optimizing responses.