(or “Why your Teachers/Professors are Full of Shit”)
I bet we’ve all heard this one:
Student: “Why did you give me a C?”
Teacher: “I didn’t give you a C, that’s the grade you earned.”
This argument is based on the idea that grading is objective. Supposedly, your grades reflect your performance and are not assigned arbitrarily by the grader. I call bullshit. Objective grading is a myth, a dangerous myth high school and college instructors have been hiding behind for years. Here are 9 reasons grading is subjective, if not entirely arbitrary.
1. Rubrics can’t create the objective from the subjective
Clearly, the grading of research papers, essays, presentations, etc. is predominantly subjective. We can’t even agree what makes for a good presentation, for instance. Some professors will require a slide show while others bemoan DBPPT (Death By PowerPoinT). That said, some idiots actually claim that subjective grading can be made objective through the use of rubrics. A rubric is a set of criteria and standards used to structure the grading process. If you’ve ever had a grade broken down into 20 points for content, 5 points for style, 5 points for bibliography, etc., that was a rubric.
I hate to rain on the educational parade here, but dividing one made up mark into 5 made up marks and adding them up does not make this process objective. How does the grader decide what gets a 4 and what gets a 5? Furthermore, the process of developing the rubric is completely subjective. Why does content get 10 points instead of 9 or 11? Why does bibliography get 5? Why doesn’t originality get points? Why don’t I get points for being poetic? I agree that rubrics help structure grading and might even facilitate discussion of the grades, but anybody who thinks that combining a bunch of subjective grades in a subjective way will magically create an objective grade is delusional.
2. Many test makers write bad “objective questions”
First, “objective” questions like true/false, fill ins, matching, and multiple choice are supposed to have one clear, correct answer because having a question with multiple answers undermines reliability and consistency in test scores. Any student assessment textbook will tell you that (see, for instance, Gronlund (2006) Assessment of Student Achievement , 8 ed. Allyn & Bacon.) That means, no “all of the above,” no “check all that apply,” no decoy answers that are sort of right, no questions worded in the negative and none of that “I and II, I and III or II and III” bullshit. If a scientist tries to measure something, like attitude toward online shopping, with confusing questions, the paper gets rejected, because it biases the results.
3. The test creation process is subjective
But suppose we have a well-trained educator who makes none of these textbook mistakes. Most test are still subjective because they are created through an entirely subjective, ad hoc process. Why 30 questions instead of 29 or 31? Why 50 minutes? Why these thirty questions? Why not different ones? What makes you think this test is better than another? What is the measure of quality for the test? The truth is, most people just make up their exams without anything beyond a superficial rationale, and sometimes not even that. They damn well don’t pretest them to make sure they’re valid (I’ll come back to validity below) and getting the TA to write it doesn’t count because the TA is not representative of the students.
4. Grading scams
Educators have many, many ways to deceive students. Here are two I’ve experienced first hand.
Professor K was quite the piece of work. He would intentionally give exams that no mortal could finish in the time allotted. After the evaluation, marks would be dismally low. Then he would graciously scale the marks so we didn’t all fail. Except he scaled everyone’s mark differently. I started out with a 72% and ended up with 78%. One of my friends went from 71% to 80%. Another guy started with 70% and was scaled to… 70%. How is that possible? He just made up the marks of course. Since everyone started at least 20% below what they deserved, whatever he made up felt like compassion. I call bullshit. It was a scam.
Scam number two is quite the classic. As I was in a business program, I had to take a lot of math-oriented classes. However, business math is pretty easy compared to stats or pure math, so if the instructors gave sensible exams, students would get very high marks. As this is considered unacceptable (what is wrong with these assholes anyway???) they took action. Action 1 was to write trick questions that test the students paranoia alertness rather than knowledge (ex: wording questions in the negative). Action 2 was to give a test so long no one can finish it. Action three was to take off 4 (FOUR!!) marks for every arithmetic error. Thus, marks reflected speed, paranoia and persnicketiness rather than, I don’t know, understanding of the material perhaps?
5. The bull (bell) curve
Some instructors mark on the bell curve, meaning that marks are statistically adjusted to fit a distribution called the normal curve. This does not objectify grades for two reasons. First, the process requires the instructor to provide a mean and standard deviation. If these are determined subjectively (and you can bet your ass that most instructors just make them up or used something prescribed by their department, which was just made up by someone else) then the whole thing is still subjective. More fundamentally, this whole process is based on a deep misunderstanding of statistics. Many measurements can be approximated by the normal distribution, not the other way around. A class’s marks are not an imperfect observation of what should be the normal distribution. The marks are what they are. The normal distribution might or might not approximate that. If it does not, mathematically transforming the grades to match it is meaningless.
6. Grading against a standard
The alternative to grading against the bell curve is grading against “the standard.” I bet you know where this is going. Since there is no objective standard (at least none I’m aware of), “the standard” is just a figment of the instructor’s imagination; thus, not objective.
7. Invalid measures (Lack of construct validity and reliability)
This brings me to my most technical criticism. Scientists have developed a whole body of knowledge surrounding the theory of measurement. Without getting too technical, grades are a kind of measure. They’re supposed to reflect student performance, which is unobservable, in the sense that you can’t just look at it, count it, weigh it, etc. For a measure to be considered acceptable by scientists (i.e., objective), it must satisfy a bunch of criteria, including construct validity, and reliability. The problem with student assessments is, the reliable ones aren’t valid and the valid ones aren’t reliable.
Update (2009-09-01): this research, conducted at the University of Texas, provides convincing evidence that grades are only weakly related to subsequent achievements.)
A reliable measurement is one that produces consistent results in different situations. Multiple choice tests are pretty reliable. It doesn’t matter who grades the test, you usually get the same grade. Essays and projects and presentations are not. If you give the same essay to different markers, you can get wildly different grades.
A measurement exhibits high construct validity when it measures what it’s supposed too. If you’re trying to measure a student’s ability to read Japanese, and you give the student a passage of simple Japanese and ask what it says, that has high construct validity. Presentations, exams, papers, etc. tend to have low validity because they measure too many different things. For instance, a multiple choice trigonometry exam measures not only understanding of trigonometry, but also test anxiety, stress, alertness, ability to guess, ability to use a calculator (or ability to do arithmetic), time management skills, etc. The situation is far worse when you try to measure something like understanding of investment strategies, or, dare I say it, critical thinking.
The bottom line is, a measure must be tested to confirm that it is valid and reliable. Without said test, we have to assume the measures are junk.
8. Too many levels
One of the reasons grades have no reliability is because there are too many levels. If an assignment is graded out of 100%, good luck finding two graders who’ll give it exactly the same mark. And if two graders don’t agree on the grades, the grades are bullshit. Some of these crack heads have even started tacking on decimals (67.24% and the like). If you replace those 100% with, say, three levels: fail, pass, and pass with distinction, you might get a reliability approaching .80 (that is, two graders will agree on 80% of evaluations). Eighty percent is considered the minimum acceptable “inter-rater reliability” in many social sciences.
9. Testing the wrong things
Now even if we could get past all of the above problems, grades are still bogus because most assignments and tests measure all the wrong things. Ever wonder why those brainiacs from college don’t turn out to be the happiest, richest, most successful and powerful people? Ever wonder why Einstein, Walt Disney, Steve Jobs, Bill Gates, Warren Buffet and so many more, never finished school, and it never stopped them? It’s because what matters in this world is not just reading comprehension but independent critical thinking; not IQ but emotional intelligence; not memorizing facts you could just look up on Wikipedia, but creativity.
What really matters is not the meticulousness to avoid mistakes, but the courage to try and the tenacity to keep trying, when lesser people would give up.
Grades don’t measure that.
UPDATE: Why do so many people who read this assume it means I got bad grades? First, this is not about me, it’s about the education system. Second, you’re making a logical fallacy (Ad Hominem) in rejecting the argument based on its source rather than its validity. Third, if you must know, I graduated top of my class in high school and undergrad, had a 4.0 GPA in grad school, and have a PhD from a top-50 university. I’ve been the student and the instructor, and grades look like bullshit from both sides.
list essay education