Course evaluations

Student course evaluations (also known as Student Evaluations of Teaching, SETs) are a standard feature of college instruction. Faculty know they suffer from various biases – particularly selection bias, where only the particularly angry or amazingly happy students fill out their evaluations. The usual advice is to take evaluations with some skepticism, and to understand that they are only useful to identify broad trends, not rank individual instructors.

Biases

Kreitzer, R. J., & Sweet-Cushman, J. (2022). Evaluating student evaluations of teaching: A review of measurement and equity bias in SETs and recommendations for ethical reform. Journal of Academic Ethics, 20, 73–84. doi:10.1007/s10805-021-09400-w

A review of the literate on bias in course evaluations. Finds two categories:
- Measurement bias. For instance, positive evaluations are correlated with lighter workloads and higher grades, and with small discussion courses vs. big lecture courses.
- Equity bias. Men “are perceived as more accurate in their teaching, have higher levels of education, are less sexist, more enthusiastic, competent, organized, professional, effective, easier to understand, prompt in providing feedback, and are less-harshly penalized for being tough graders”, while students are “more likely to expect special favors from female professors and react badly when those expectations aren’t met or fail to follow directions when they are offered by a woman professor”. In particular, evaluations seem to follow gender roles: women are rated highly for “exhibiting traditionally-feminine traits”, while “men are evaluated positively on gendered perceptions of their intellectual and teaching prowess”. There has been less research bias from race and ethnicity, but there are plenty of reasons to suspect it exists too.
Recommends labeling evaluations as “perceptions of student learning”, not as a measure of teaching, and not relying upon them to evaluate instructors.
Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. PS: Political Science & Politics, 51(3), 648–652. doi:10.1017/s104909651800001x

Compares course evaluations between two sections of an online course, one taught by a man and the other by a woman. The “lectures, assignments, and content were exactly the same in all sections”; the only difference was grading and interaction with the instructor. Student evaluations rated the female instructor worse, but also rated the course itself and its technology lower, even though they were identical. Free-form comments on RateMyProfessors also showed gender bias, focusing on the female instructor’s personality and appearance more than for the male instructor.

The paper opens with a truly horrifying student email to the female instructor: “I want you personally to know I have hated every day in your course, and if I wasn’t forced to take this, I never would have. Anytime you mention this course to anyone who has ever taken it, they automatically know that you are a horrific teacher, and that they will hate every day in your class.”
MacNell, L., Driscoll, A., & Hunt, A. N. (2014). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303. doi:10.1007/s10755-014-9313-4

A clever experiment, also with an online course. The course was split into discussion groups, each taught by either a male or female assistant instructor. “Each instructor was responsible for grading the work of students in their group and interacting with those students on course discussion boards. Each assistant instructor taught one of their groups under their own identity and the second group under the other assistant instructor’s identity.” They found “there is a significant difference in how students rated the perceived male and female instructors, but not the actual male and female instructors.” But comparing by identify, “the male identity received significantly higher scores on professionalism, promptness, fairness, respectfulness, enthusiasm, giving praise, and the student ratings index.” (These differences were about half a point on a 0-5 Likert scale.) They “contend that female instructors are expected to exhibit such traits and therefore are not rewarded when they do so, while male instructors are perceived as going above and beyond expectations when they exhibit these traits.”

Note, however, that the course sections were small and the lower scores for the female-presenting instructor were caused by a few students giving the lowest possible score—so perhaps the bias is driven by a subset of students, rather than being uniform across all students.

Actual learning

From pedagogy we learn that students find it difficult to evaluate their own learning and often do not develop expert-like thinking in a course. Are students able to give course evaluations which accurately reflect how well the course taught them?

Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. doi:10.1016/j.stueduc.2016.08.007

“Our up-to-date meta-analysis of all multisection studies revealed no significant correlations between the SET ratings and learning.”

This contrasts against previous studies and meta-analyses, which did find correlations. The authors suggest that “small-to-moderate SET/learning correlations may be an artifact of small sample sizes of most of the primary studies and small sample bias.” (See Statistical power and underpowered research.)

Biases

Actual learning

Subject area differences