See also Pedagogy.
Student course evaluations (also known as Student Evaluations of Teaching, SETs) are a standard feature of college instruction. Faculty know they suffer from various biases – particularly selection bias, where only the particularly angry or amazingly happy students fill out their evaluations. The usual advice is to take evaluations with some skepticism, and to understand that they are only useful to identify broad trends, not rank individual instructors.
However, there turn out to be other biases and issues beyond selection bias.
Uttl, B. (2023). Student Evaluation of Teaching (SET): Why the emperor has no clothes and what we should do about it. Human Arenas. doi:10.1007/s42087-023-00361-7
Hot takes! Uttl has been involved in many papers about course evaluations, and thinks they suck:
It concludes that SET are harmful to professors, to students, to quality of higher education, to society at large, and are higher education’s proverbial highway to hell. I will argue that accumulated evidence makes it clear that SET are not reliable, not valid, measure teaching effectiveness-irrelevant factors (TEIFs), measure student preference factors (SPFs), and are easily manipulated by giving students incentives. SET are irredeemable. They have not been fixed in 100 years of research and will not be fixed in the next 100 years of research because they are fundamentally flawed and cannot measure professors’ teaching effectiveness with any degree of accuracy, and definitely not with a degree of accuracy required for high stakes decisions about individuals such as terminating professors’ career.
The salient points are that
There’s some discussion of alternative options. Peer evaluations are hamstrung by the lack of a definition of effective teaching, so peers give contradictory advice; one better option might be student surveys that only ask about objectively measurable behaviors, like “were test grades returned within 5 days?”, to detect outlier faculty who have serious issues.
Kreitzer, R. J., & Sweet-Cushman, J. (2022). Evaluating student evaluations of teaching: A review of measurement and equity bias in SETs and recommendations for ethical reform. Journal of Academic Ethics, 20, 73–84. doi:10.1007/s10805-021-09400-w
A review of the literate on bias in course evaluations. Finds two categories:
Measurement bias. For instance, positive evaluations are correlated with lighter workloads and higher grades, and with small discussion courses vs. big lecture courses.
Equity bias. Men “are perceived as more accurate in their teaching, have higher levels of education, are less sexist, more enthusiastic, competent, organized, professional, effective, easier to understand, prompt in providing feedback, and are less-harshly penalized for being tough graders”, while students are “more likely to expect special favors from female professors and react badly when those expectations aren’t met or fail to follow directions when they are offered by a woman professor”. In particular, evaluations seem to follow gender roles: women are rated highly for “exhibiting traditionally-feminine traits”, while “men are evaluated positively on gendered perceptions of their intellectual and teaching prowess”. There has been less research bias from race and ethnicity, but there are plenty of reasons to suspect it exists too.
Recommends labeling evaluations as “perceptions of student learning”, not as a measure of teaching, and not relying upon them to evaluate instructors.
Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. PS: Political Science & Politics, 51(3), 648–652. doi:10.1017/s104909651800001x
Compares course evaluations between two sections of an online course, one taught by a man and the other by a woman. The “lectures, assignments, and content were exactly the same in all sections”; the only difference was grading and interaction with the instructor. Student evaluations rated the female instructor worse, but also rated the course itself and its technology lower, even though they were identical. Free-form comments on RateMyProfessors also showed gender bias, focusing on the female instructor’s personality and appearance more than for the male instructor.
The paper opens with a truly horrifying student email to the female instructor: “I want you personally to know I have hated every day in your course, and if I wasn’t forced to take this, I never would have. Anytime you mention this course to anyone who has ever taken it, they automatically know that you are a horrific teacher, and that they will hate every day in your class.”
MacNell, L., Driscoll, A., & Hunt, A. N. (2014). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303. doi:10.1007/s10755-014-9313-4
A clever experiment, also with an online course. The course was split into discussion groups, each taught by either a male or female assistant instructor. “Each instructor was responsible for grading the work of students in their group and interacting with those students on course discussion boards. Each assistant instructor taught one of their groups under their own identity and the second group under the other assistant instructor’s identity.” They found “there is a significant difference in how students rated the perceived male and female instructors, but not the actual male and female instructors.” But comparing by identify, “the male identity received significantly higher scores on professionalism, promptness, fairness, respectfulness, enthusiasm, giving praise, and the student ratings index.” (These differences were about half a point on a 0-5 Likert scale.) They “contend that female instructors are expected to exhibit such traits and therefore are not rewarded when they do so, while male instructors are perceived as going above and beyond expectations when they exhibit these traits.”
Note, however, that the course sections were small and the lower scores for the female-presenting instructor were caused by a few students giving the lowest possible score—so perhaps the bias is driven by a subset of students, rather than being uniform across all students.
From pedagogy we learn that students find it difficult to evaluate their own learning and often do not develop expert-like thinking in a course. Are students able to give course evaluations which accurately reflect how well the course taught them?
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. doi:10.1016/j.stueduc.2016.08.007
“Our up-to-date meta-analysis of all multisection studies revealed no significant correlations between the SET ratings and learning.”
This contrasts against previous studies and meta-analyses, which did find correlations. The authors suggest that “small-to-moderate SET/learning correlations may be an artifact of small sample sizes of most of the primary studies and small sample bias.” (See Statistical power and underpowered research.)
Uttl, B., White, C. A., & Morin, A. (2013). The numbers tell it all: Students don’t like numbers! PLoS ONE, 8(12), e83443. doi:10.1371/journal.pone.0083443
Surveying freshmen in intro psychology classes before they took any quantitative courses, “the mean interest in statistics courses was nearly 6 SDs below the mean interest in non quantitative courses. Moreover, women were less interested in taking quantitative courses than men.” Ouch. Suggests that judging faculty teaching quantitative courses by the same student evaluation standards is a bad idea, since student interest in the courses is so dramatically different. Also, “the lack of interest in quantitative and research methods courses among undergraduate students also threatens the very existence of psychology as well as other fields as a science.”