One size fits none: rethinking student course evaluation surveys
Hosted Dialogue Session
St. Thomas University Learning and Teaching Development Office, 4 April 2014
[NOTE: If you'd like more information about, or want to comment on, any of this, please email me. This is part of what I'm hoping will be an article about alternatives in "Course Evaluations"]
Most people have problems with the current form of course evaluations at St. Thomas, and at most universities. Many of the questions seem irrelevant to how you actually teach, the numbers seem a crude measure at best, student written comments are short and cursory if they exist at all, and the way they are usually used -- as fundamental components of promotion and tenure decisions -- gives them disproportionate weight.
Russ Hunt has been thinking different about them for some years, and he'll introduce some unconventional ways of imagining, creating, and conducting these surveys -- and invite discussion.
Among other things: why not do them online? Why not ask the questions you really need answers to? Why not do them in another way than taking up time during the last meetings of class? Why not use them as the basis for discussion, and as a way of engaging students in the process? Why not persuade the university to entertain more meaningful alternatives?
*The whole process for first term of English 1006, 2013-14
Doing research on them it's useful to remember some of the range
of things they're called, which need to be used as search terms:
SETs (Student Evaluations of Teaching); SRIs (Student Ratings of
Instruction); SRTs (Student Ratings of Teaching); Course
Evaluations; Student Course Evaluations; Course Evaluation Surveys
. . . etc. There are disputes about terminology -- for example,
"evaluations" suggests more authority than "ratings."
Nothing about teaching is more frequently discussed, in print or
on line. A discussion of one topic -- "SETs under attack again" --
on the POD list created well over a hundred postings in a matter
of days last summer. A discussion of a Wall
Street Journal attack on the practice preoccupied the
STLHE-L list last November, and was dismissed on the POD list by
one of the major figures in the literature about the practice,
Mike Theall.(see quote from him below.)
Questions that come up regularly:
And as well:
From the POD network:
Mike Theall, 9 November 2013: "I agree with the last sentence in the abstract . . . but with the change of the wording to "... SETs should not be the only data used to evaluate faculty.' Of course, 'SETs' is an incorrect and misleading acronym that itself adds to the resistance to ratings. Maybe that's why Asher resents having the opinions of "teenagers" registered, and why it's the same old, unsubstantiated complaint we have heard for decades. Many on this list (e.g., Arreola, Berk, Hativa, me) have recommended the use of "student ratings" since ratings are only one source of data used by the real evaluators of faculty performance (peers & administrators).
One other comment about Raoul's book & his eight-part "comprehensive" evaluation process. Among other things, that process requires dialogue and agreement on: 1) defining the roles and requirements for faculty work ... e.g. teaching, scholarship, service, administrative responsibilities; 2) definition of the specific and measurable components for each domain of work; 3) assigning weight to each component; identifying useful sources of data; and 4) finding appropriate measurement devices. I would add that providing faculty and administrators (even student raters) with help/training in providing/understanding/using the data can also improve the overall evaluation system. The mass of the eval/ratings literature far outweighs the few studies that attempt to show bias and/or lack of validity & reliability. If properly handled, reviewing the evidence for these constituencies can also make the evaluation process more fair and accurate. When Raoul & I began developing the "meta-profession" model, his 8-part process was expanded to include evidence-based characteristics of performance related to the 4 roles noted above. Having such a framework can be used for both formative and summative purposes (as described in the Theall-Arreola-Mullinix citations provided in my recent post) and it helps to avoid committee-generated lists of individuals "favorite" items/issues."
Ed Nuhfer, 9 June 2013, responding to “Why not use tests of learning instead of ratings?”: “Student ratings of professors also have high reliability, and attempts to correlate them with tests and grades of unknown reliability will not produce any better results. When one correlates a knowledge survey with a separate measure of learning performance, expect to obtain a correlation that is numerically at about the reliability of the LEAST reliable instrument less .2 to 0.3. Many grades and tests have reliabilities less than .5 -- which means one will often get a reliability of about zip by mindlessly correlating any kind of student ratings with tests and grades of unknown reliability.
So, if your interest is doing student ratings forms that address content, skills and knowledge, use a knowledge survey that focuses on content, skills and knowledge. If you are interested in rating professors, use the conventional student ratings form. But we do need to quit pretending that student ratings have merit as some kind of assessment measure of student learning. They do not.”
Nuhfer, 10 June 2013: “What is really important to remember is that SRIs began in the 1920s and were created by students for student use. They were co-opted by various administrations from 1930 (the 'Improvement of Teaching' movement) through the 1970s (when POD began). Frankly, I think it is absolutely pointless to argue that we should do away with SRIs. We have them because we want to control how they are constructed, and especially because we wanted them to be valid and reliable. A lot of work has gone into that. Throwing them out because a lot of people have forgotten how to use them just means the students will be in charge again. Hands up everyone who likes RatemyProf.com! Because that’s where we’ll be going if we throw out the baby with the bathwater. So maybe the conversation should go back to 'Ok, we are stuck with these. How can we use them well?' and also, 'Ok, how CAN we measure student learning?'”
Angela Linse, 10 June 2013: “We need to focus on helping faculty take ownership of their ratings rather than leaving it entirely to someone else to interpret the numbers, not faceless review committees or local administrators. We also need to be talking to administrators about taking some control of rampant misuse of the data, e.g. that tendency to want everyone to be "above average" over-interpreting differences of a couple of 10ths or 100ths of a point when looking at a faculty member's average ratings, making summative decisions based on rankings of or comparisons between faculty (who no doubt differ along many other dimensions). ARGH! Most of the problems with student ratings seem to me to lie with how the data are used.”
Stephen L. Benton and Stephen L. Benton. (2012). “Student Ratings of Teaching: A Summary of Research and Literature.” Idea Paper #50: The IDEA Center. Online: theideacenter.org
“There are probably more studies of student ratings than of all of the other data used to evaluate college teaching combined. Although one can find individual studies that support almost any conclusion, for many variables there are enough studies to discern trends. In general, student ratings tend to be statistically reliable, valid, and relatively free from bias or the need for control, perhaps more so than any other data used for faculty evaluation.”
Nira Hativa. (2013). Student Ratings of Instruction: A Practical Approach to Designing, Operating and Reporting. Oron Publications.
[ten pages of useful bibliography]
Pia Marks. (2012). Silent Partners: Student Course Evaluations and the Construction of Pedagogical Worlds. Canadian Journal for Studies in Discourse and Writing, 24(1).
“Results indicate that the genre projects an institutionally dominant ideology about teaching and learning in the Faculty of Arts which is at odds with emerging practices. Qualitative analysis suggests that the instrument acts [as] a silent partner for students, mediating pedagogical meaning for them, as well as for instructors, seeking to impose institutionally dominant pedagogies and to influence their pedagogical decisions.”
The April 2 Chronicle of Higher Education has a piece suggesting a way to do an interactive course feedback session. I'll try it, I think, next time I teach.
Peter Filene has a suggestion for dealing with a course that does not seem to be going well. If discussions falter, responses to the readings are cursory at best, and you do not seem to be getting through to students, Filene suggests involving the students in diagnosing the problems. Distribute index cards to the students and ask them to evaluate the course by responding to questions. These can be very general, as in 'What is going well?' and 'What do you think could be improved or changed?'. The questions, alternatively, could be more focused on specific issues: 'How does the difficulty of the readings compare to your other courses?', 'What holds you back from participating in discussions,' etc.
At the end of the exercise, you can collect the index cards, take them home and think about your students’ answers. Or you can shuffle the cards, return them to the students, and have each student read a card aloud, beginning a class-wide discussion of what’s wrong with the class and how the problems can be fixed.
David Gooblar. (2014). “It's Time for a Course Correction.” chroniclevitae.com: https://chroniclevitae.com/news/420-it-s-time-for-a-course-correction?cid=at&utm_source=at&utm_medium=en. Drawn from Peter Filene (2005). The Joy of Teaching: A Practical Guide for New College Instructors. Chapel Hill: U of North Carolina P, 2005, 71-73.