I was fortunate enough to work in my district’s assessment office for over two years. I worked with teachers and other district administrators to create all of the district-wide exams for English, their electives, and some CTE (Career and Technical Education) courses. As I work for one of the ten largest districts in the country, this was a major undertaking.
In this position, I was privy to some unique perspective on how the scaling process works not only at a district level, but also for large testing companies like College Board and ACT. It changed my perspective regarding curves and scales for individual assessments and overall course grades as a whole.
This experience came after a more negative experience early in my teaching career, where I wound up losing a position because of, among other issues, my course grades. I find that the teachers I work with face a major obstacle with grades. A large number of students don’t do what they’re expected to do. They don’t do homework. They don’t do class work. They don’t turn in work they actually take the time to do. They don’t study for tests. Or, they study for tests but because they didn’t do the other important work, they fail anyway. If you take a look at many teachers’ raw grade book scores for any individual assignments, there are an alarming number of failing grades.
Now, teachers can lament and rend their garments all they like about how lazy students (and/or parents, etc.) are and how, “If (insert name here) would just do what s/he’s supposed to, they would be passing!” But at some point, there is a reality game to play. The reality is that in any given class, there is a spectrum of students. There always has been, and there always will be. Even in honors or AP classes, there is a spectrum. Some of it has to do with student effort, but some of it is related to innate ability/talent/intellect, what-have-you. Let me tell you, I worked my behind off in my college computer programming class (why I had to take it to get certified for middle school math is beyond me, but that is neither here nor there). I did every assignment. I went to every class. I went to office hours. I asked questions. I worked HARD. And I barely – BARELY – made a “C.” There were assignments and assessments I failed. Not because I was lazy, but because my brain just did not work that way. It wasn’t something for which I had a natural affinity. Anyway, the point is that in any class, there is a spectrum of students. And I can’t (nor should I, in my opinion) make ten different ability versions of a quiz or test so that I can accurately assess every single student where they are in that moment. Every student takes the same test. Maybe I have 2 or 3 versions, but it’s of the same test – same level, rigor, etc. And I know – heck, I’ll even use the word “expect” – there to be failures. Because my tests are well-made. I follow a 30/40/30 or 25/50/25 rule for my tests in terms of easy/medium/difficult questions. I need to have enough questions to differentiate my “D” students from my “F” students. I need to have enough questions to differentiate my “B” students from my “A” students. I know student Q over there isn’t going to get an “A.” I know that because he’s on the lower end of my spectrum. But I’m not entirely sure if he’s a “D” or an “F” student, so that’s where my 25-30% of easy questions come in. I know student S is going to ace the easy and medium questions, but I’m not sure if student S is an “A” or a “B” student, so that’s where my 25-30% of hard questions come in.
The problem is, when you have a class that has a really wide, and heavily skewed spectrum, you end up doing a lot of “D”/”F” differentiation and not a lot of “A”/”B” differentiation. Which makes your test scores look like crap. And if that’s the case for all of your assignments and assessments, then your class averages look like crap.
So instead of grading in a CRT (criterion-referenced test) system, which looks at raw scores as the basis for grades, I went to a more NRT (norm-referenced test) system, which scales – or in some cases curves – scores to assign grades. This is what large districts like my own do, as well as major national testing companies like the College Board and ACT. There’s a reason they convert raw scores to scale scores. There’s a reason their national averages work out to a bell curve. They recognize and base their scoring on the fact that there is a spectrum of students who take their tests.
Now, I’m not suggesting that we don’t let students fail. I’m not suggesting what some schools or even entire districts mandate (not mine, thankfully…yet), which are things like having minimum grades of 50% or even 55% rather than zero’s in a gradebook. I see the argument behind that, but…frankly, that’s a topic for a different entry. I’m not suggesting that a student who answers 2 questions right out of 25 end up with an A or a B. However, I am saying that scaling scores is a reasonable way to norm assessment, assignment, and entire class grades so that a teacher doesn’t find him/herself at the end of the quarter with 15 F’s and 10 B’s and C’s and no A’s. I’ve been on the receiving end of that discussion with admin and believe me, it wasn’t pretty. Scaling grades isn’t grade inflation, when it’s done appropriately. Here is how a scaling process might work.
Let’s say, for time’s sake, that my class has 10 students. I know, right? Heaven. But again, this is for time’s purpose, not reality. I give a test to a low-performing class and this is my raw score data:
|Student||Raw % Score||Raw letter grade|
Something really important to remember in the scaling process is that I do NOT discard my raw data. This class needs a LOT of remediation and further instruction on this topic. I would NOT want to move on to the next concept before doing some serious re-teaching. The average for this assessment is only a 44%. That is TERRIBLE. Something went horribly wrong.
Here is where I have a decision to make. I can choose to curve my scores OR I can choose to scale them. Curving is different from scaling. Curving is adding points to raw scores, which results in a change to the overall average score. This is probably more common than scaling because lots of teachers already do it. “Oh, man, everyone missed question 7. Gee, now that I look at it, I can see that question 7 really was kind of vague. It really wasn’t a great question. I’m going to throw it out.” When you throw out/don’t count a question, you’re simply adding the same number of points to everyone’s raw score. That’s a curve. Or, when you say, “Hey, this average was only a 44% I want the average to be 70%. That means I have to add 26 percentage points to everyone’s score.” Or, you look at the highest score and if it isn’t an A, you add however many points that person needs to move up to an A to everyone’s score. That’s a curve. That’s how we get the expression “blow the curve.” When everyone else does terribly but one person scores an A, it can negate the need for a curve. Using curves is pretty easy and accomplishes something similar to a scale, which is it saves your grade book and students’ overall averages.
Now, if I wanted to scale these scores, what I would do is decide what I wanted my normed curve to look like, do I want a bell curve? Do I want a skewed curve? If so, do I want to skew low or high? I am pretty anal about this and really love the beauty and symmetry of the bell curve, so I usually scale my scores to resemble a bell curve. You can also choose to scale the scores to some other norm. For example, in my district, the semester exam scales were created to match the norm curve of student course grades. If 15% of students earned an A, 20% earned a B, 45% earned a C, 5% of students earned a D, and 15% of students earned an F, then that is what the scale would reflect. Or try to come close to, anyway.
So let’s scale my quiz scores to match a bell curve. Without going into some higher-level statistics, the basic breakdown of the bell curve says the majority of scores fall in the middle (C), some of the scores fall on either side of that – an equal amount – as B’s and D’s, and the smallest amount of scores fall outside of those – again, an equal amount – as A’s and F’s.
With only 10 students, our bell curve might look something like this:
This is what the scale scores look like from the original table:
|Student||Raw % Score||Raw letter grade||Scale score|
Their scale score has nothing (or very little) to do with their raw score and everything to do with the normed curve and their performance compared to their peers. This preserves the spectrum of students in the class without inflating grades or hurting teachers’ grade books.
When scaling is used in conjunction with curving, it can be effective in maintaining a “normed” or “balanced” grade book. If teachers discount raw score information, though, or curve blindly just to get to a “nice average” for an assessment, this isn’t best practice. Without looking at the assessments, teachers aren’t able to become aware of flaws or bias in individual items or student weaknesses and areas in need of intervention. Again, I wouldn’t intend to just move on from this assessment. I would intend to reteach and reassess it until those raw “F” students were able to score raw “C’s”. Am I going to move Student A? Maybe from a raw “F” to a raw “D,” but probably not much more than that. Could I get my raw “C” to a raw “B” and/or my raw “B” to a raw “A”? Probably. I would hope so. But, I don’t want the original assessment to kill my grade book or students’ overall averages, so using a scale is helpful and appropriate. Am I going to achieve my goals every time? Nope. That’s not reality. But the point is that would be my intention, and that’s what matters. I could throw the whole assessment out, but what message am I sending to students then? Did it even matter in their eyes if the grade doesn’t count? Students struggle (often, at least in my experience) to see the value in an assignment that doesn’t get a grade, and then you run into motivation issues. But again, another topic for another entry.
Once I started using curves and scales responsibly, my grade book looked great. I don’t have tons of A’s and I don’t have kids who clearly either didn’t understand a concept or were ridiculously lazy getting A’s, both of which would indicate indiscriminate grade inflation. What I do have, however, is evidence for my administrators that I reflect thoroughly on my grading process and the assessments themselves.