Monday, February 18, 2008

The flip side of grade inflation?

Over at Cliopatria, Hugo Schwyzer has a surprising post about instructor evaluation inflation. Schwyzer (a historian, for the record) recently got a batch of his teaching evaluations back. The results:

A summary, prepared by the division dean, was attached to the front. Five of my seven classes were evaluated; a total of 211 students participated by turning in evaluations. Students were allowed to rate their profs as "Outstanding", "Good", "Average", "Poor", or "Failing." My ratings (and you'll just have to take my word for it) were

Outstanding: 84%
Good: 15%
Average: 1%
Poor: 0%
Failing: 0%

Now, lest you think I write only to brag, note what else was in my summary: the details of the college and departmental averages for all full-time faculty. The college reports the following ratings for some four hundred professors evaluated campus-wide last fall:

Outstanding: 65%
Good: 28%
Average: 6%
Poor: 1%
Failing: <1%

Clearly, grade inflation works both ways! 93% of the faculty ranks above average. 65% of us are outstanding, which raises an obvious question about what it is that so many of us can be standing out from! What on earth does "average" mean when only 6% of full-time faculty fall into that category?

Is Schwyzer right? Are we the beneficiaries of evaluation inflation as pernicious as the grade inflation so many of us deride? To the extent that student evaluations of teaching are valid (and to show my hand here, my view is "more valid than people think, but limited and not perfectly reliable"), does this phenomenon, if real, render them useless?
I gather that the compression of results toward the high end makes distinguishing among the good and competent instructors challenging, but maybe the upshot is that the utility of these evaluations derives from the bottom end: If 'outstanding' is the norm, then how terrible an instructor do you have to be to get dubbed 'poor' or 'failing'?

I have to say that Schwyzer's post motivates me to go back and look at my own evaluation data and look at the disciplinary norms at my institution. How are matters where you teach — has the Lake Wobegon effect, as Schwyzer dubs it — kicked in?


  1. One thing to note (although it's not clear what, if anything, this says about evaluation inflation) is that the options given to Schwyzer's students might be sending out mixed messages. Describing someone's teaching as 'outstanding' or 'average' does seem to require comparing them to others. But this isn't the case with 'good,' 'poor' or 'failing.'

    Perhaps when faced with options like these students tend to take the evaluation to be concerned with one of either a relative or an absolute assessment of teaching quality. (Presumably this is the case, it's just not clear which is intended.) If they plump for an absolute assessment (taking 'outstanding' to mean 'very good,' and 'average' to mean 'not very good' (I've certainly heard students use the term this way in ordinary conversation)) then 65% of teaching being outstanding isn't problematic.

  2. I think Jonathan's right about this. The forms that my students fill out (and the ones I filled out as a student) have a series of bubbles numbered 1–7. It may say somewhere at the top that 7 means "outstanding," but I'll bet that relatively few students notice or think about that. I think that 'outstanding', in this case, just means 'really good'.

  3. I remember reading an article about evaluations of employees. Unfortunately, I can't remember the source.

    Basically, it said that to maintain high morale, organizations should not officially admit that the average employee is average. If someone thinks they are average, that hurts their morale - better that they are told that they are good, or very good.

    Of course, you want your evaluations to indicate which employees are not meeting the required standards, so that they can be warned to either improve or lose their jobs. Also, there are usually a few who really stand out from the pack, and that should be noted as well so they can be tipped for promotion.

    So, ideally, a few employees will get the top evaluation, most will be bunched together, and a few will have evaluations that are clearly not good enough. You do want to let people know if they are on the borderline as well. Those who are bunched together should be told that they are doing very well, and that most of their peers are also doing very well. Otherwise you create resentment.

    I've noticed, not just in academia, that people who have to fill in evaluations on employees understand this pretty well. You don't rate someone as less than average unless you want to give them the kiss of death, and even 'average' should be used as a warning shot. The top category should also be used sparingly, as a reward. It looks as though students have cottoned on to this as well.


