Project

In this project, you are a statistical consultant for a scientist who wants to conduct an experiment. You have three tasks:

  1. Select an appropriate design for the experiment, considering blocking, the treatment effects and contrasts of interests, the sample size necessary to attain the goals, and the analysis you will conduct once you have the data.
  2. Write a short proposal stating your chosen design and justifying the decisions you made.
  3. Analyze (simulated) data from the experiment you designed. Write a short report stating your results.

This project will take the place of homework for the remainder of the course, and is in lieu of a final exam. Here is the timeline:

Part Date
Design & write proposal Due Wednesday, April 24
Receive data Friday, April 26
Write analysis report Due Friday, May 3

Setting

Disclaimer

The situation, people, and universities portrayed in this project are fictitious. Any resemblance to real people and universities is purely coincidental.

Background

The administration at Cranberry Melon University is struggling to deal with its large enrollment. As incoming classes have expanded, required courses have gotten larger and larger—but there are not enough large lecture halls to fit them. To accommodate all the large classes, administrators are considering holding more classes at 8am, as that is the only time the large lecture halls are not being used.

However, any undergraduate student can tell you that nobody wants to be awake and in a required course at 8am. Students tend to stay up late, so if they get up in time for an 8am class, they may only get a few hours sleep; and any sleep researcher can tell you that getting insufficient sleep affects cognitive abilities, making it harder to learn. The problem, then, is that holding 8am courses and expecting students to attend may harm their performance in all their classes, as the students are constantly sleep-deprived.

The administration has asked you to conduct an experiment to determine if 8am classes significantly harm student GPAs.

Experimental setup

The general education curriculum at Cranberry Melon University requires all students to take a course in professional communication, so four large sections of Introduction to Meme Production are taught every semester. The experiment will be conducted next spring, when there are two instructors available to teach the four sections. Two of the sections will be at 8am and two will be at 1:30pm.

The registrar is willing to recruit students to participate in the experiment. They will pay each participating student $200. Participants can be assigned to the sections—that assignment is up to you—and will provide the following information before the experiment begins:

  • year: Their year in the program. Since these are gen-ed courses, they are taken by freshman and sophomore students only.
  • demo_race and demo_gender: Defined as in the pilot data provided below.
  • cum_gpa: Their cumulative GPA up to (but not including) the semester of the experiment, on a scale of 0 to 4.0.

The registrar, citing federal privacy laws, is not willing to give you other information about the students. After the experiment, you will receive two response variables:

  • term_gpa: Their GPA for their spring semester courses on a 0-4.0 scale.
  • TotalSleepTime: The participants will wear a FitBit sleep tracker during the study, and this variable reports the average time they spent in bed per night, in minutes.

To help you design the experiment, the registrar has provided data from an earlier observational study of sleep habits and GPA. That study did not specifically assign students to 8am classes, but it did measure student GPAs, demographics, and sleep habits, giving you a sense of the distribution of these variables.

Goals

The Senior Vice Provost for Academic Success has asked you to address the following questions:

  1. Do 8am courses cause students to get less total sleep?
  2. Do 8am courses cause students to have lower semester GPAs?
  3. Do these effects differ by race or gender? It would not be acceptable to disproportionately harm one group over another.

You have the following constraints:

  • There are only four course sections.
  • There are two instructors available to teach them. Each section can have only one instructor; you can’t have them co-teach.
  • Each section can only fit 100 students at most.
  • Each participant costs $200 plus the price of a FitBit, so the Senior Vice Provost would like the size kept to a minimum. (Non-participants will still take the courses, but won’t have FitBits and we won’t get their data.)

You have the power to determine:

  • Which students will be recruited. You can recruit students from each year, race, and gender; you cannot recruit students with specific GPAs.
  • Which sections will be taught by which instructors
  • Which students will be assigned to which sections.

You cannot assign their prior cumulative GPAs, but you will receive that data with the response data.

Phase 1: Design

In this phase, you will design the experiment and write a short proposal explaining your design.

Design file

You must provide a CSV file defining your experimental design. It must contain one row per participant, with the following columns:

  • year: Values either freshman or sophomore
  • demo_race: Values either 0 or 1, as defined in the sample dataset
  • demo_gender: Values either 0 or 1, as defined in the sample dataset
  • section: Values either 8am-1, 8am-2, 130pm-1, or 130pm-2, indicating the course section this student is assigned to.
  • instructor: Values either A or B (for instructor A or instructor B). This must match up with section, since you need to decide which instructors teach which sections.

Because of the constraint that there may be only 100 students per section, this design file may contain only 400 rows at most.

You may generate the design from code or manually create portions of it, but any parts of the design that are randomly assigned must be truly randomly assigned.

Save your file as [yourandrewid]-design.csv.

Before you submit your design, use check-project-design.R to verify that your design file is valid. The check_design() function loads your design CSV and checks that the values provided in your design are correct. This function must run without any warnings or errors before you can submit your design. If you provide an invalid design, I will not be able to provide you response data.

Proposal

Your proposal should contain the following sections:

  • Executive Summary. In 2-3 paragraphs, describe the type of experiment you’re conducting, how many students it will require, and the anticipated power of the experiment to detect the desired effects. If you believe some effects will not be detectable, or anticipate other limitations, state that here. Very briefly state the type of analysis you will conduct. This summary is for the Senior Vice Provost, so it must not be technical. They are interested in cost, size, and success, not statistical detail.
  • Design. In half a page or so, describe the experimental design in more details. State any blocking variables, how instructors and students will be assigned to sections, and give any useful summary tables, such as the number of students per section or the number of students in different treatment combinations.
  • Power Analysis. In half a page or so, describe the results of a power analysis you have conducted. Briefly describe how you did the power analysis and state the power you anticipate to detect effects.
  • Data Analysis. In half a page or so, describe the analysis you will conduct once the response data is available: the type of model you’ll use, the blocking and treatment factors you’ll use, and the hypothesis tests or confidence intervals you’ll use to answer the research questions. “I will use ANOVA” is not sufficient; you must identify specific contrasts or tests.

You may plan to use any analysis method as long as it answers the Senior Vice Provost’s research questions.

Your overall proposal should only be 4–5 pages. It should be in PDF format, and it should not contain any code, only text and (if necessary) mathematics.

Phase 2: Analyze

Data format

I will use your provided design to generate a CSV of the experimental results. It will contain all the same columns as your design, plus three more: cum_gpa, term_gpa and TotalSleepTime, giving the GPA and average sleep time for each participant. The filename will be [yourandrewid]-results.csv.

Analysis

Conduct your data analysis following the plan in your proposal. Generate your results and any tables or figures that will illustrate them. Include checks of the model assumptions.

Report

You will submit an analysis report with your results. This report will be distributed to the Senior Vice Provost, the Registrar, the Board of Trustees, and the department heads, so it is meant to describe the experiment and your recommendations at a high level. The technical detail is reserved for the Methods section only.

  • Introduction. Briefly state the problem to be solved, the goal, what kind of experiment was conducted, and a summary of the results.
  • Methods. In technical detail, describe the experimental design, the analysis you used, and any checks you did for model assumptions.
  • Results. Present the results of your analysis. State any hypothesis test results or confidence intervals, give figures our tables illustrating any effects you found, and so on.
  • Discussion. Interpret the results in terms of the problem. What do you recommend Cranberry Melon University should do? What evidence have you found? If you found effects, how big are they? Who is affected?

The report should only be about 5 pages; this should be a straightforward data analysis without lots of complicated detail.

You can read my notes on writing statistical reports for recommendations on what each section should contain, how to present results, and how to write about statistics.

Your report should be in PDF format and include relevant figures and tables. It should not contain any code. You may typeset it using any program, though I recommend using Quarto or R Markdown so the text and code are in the same file. (But again, the report PDF should not contain the code, so set echo=FALSE in your code chunks.)