Why I Scale (and Curve) My Grades

I was fortunate enough to work in my district’s assessment office for over two years. I worked with teachers and other district administrators to create all of the district-wide exams for English, their electives, and some CTE (Career and Technical Education) courses. As I work for one of the ten largest districts in the country, this was a major undertaking.

In this position, I was privy to some unique perspective on how the scaling process works not only at a district level, but also for large testing companies like College Board and ACT. It changed my perspective regarding curves and scales for individual assessments and overall course grades as a whole.

This experience came after a more negative experience early in my teaching career, where I wound up losing a position because of, among other issues, my course grades. I find that the teachers I work with face a major obstacle with grades. A large number of students don’t do what they’re expected to do. They don’t do homework. They don’t do class work. They don’t turn in work they actually take the time to do. They don’t study for tests. Or, they study for tests but because they didn’t do the other important work, they fail anyway. If you take a look at many teachers’ raw grade book scores for any individual assignments, there are an alarming number of failing grades.

scales oops

Now, teachers can lament and rend their garments all they like about how lazy students (and/or parents, etc.) are and how, “If (insert name here) would just do what s/he’s supposed to, they would be passing!” But at some point, there is a reality game to play. The reality is that in any given class, there is a spectrum of students. There always has been, and there always will be. Even in honors or AP classes, there is a spectrum. Some of it has to do with student effort, but some of it is related to innate ability/talent/intellect, what-have-you. Let me tell you, I worked my behind off in my college computer programming class (why I had to take it to get certified for middle school math is beyond me, but that is neither here nor there). I did every assignment. I went to every class. I went to office hours. I asked questions. I worked HARD. And I barely – BARELY – made a “C.” There were assignments and assessments I failed. Not because I was lazy, but because my brain just did not work that way. It wasn’t something for which I had a natural affinity. Anyway, the point is that in any class, there is a spectrum of students. And I can’t (nor should I, in my opinion) make ten different ability versions of a quiz or test so that I can accurately assess every single student where they are in that moment. Every student takes the same test. Maybe I have 2 or 3 versions, but it’s of the same test – same level, rigor, etc. And I know – heck, I’ll even use the word “expect” – there to be failures. Because my tests are well-made. I follow a 30/40/30 or 25/50/25 rule for my tests in terms of easy/medium/difficult questions. I need to have enough questions to differentiate my “D” students from my “F” students. I need to have enough questions to differentiate my “B” students from my “A” students. I know student Q over there isn’t going to get an “A.” I know that because he’s on the lower end of my spectrum. But I’m not entirely sure if he’s a “D” or an “F” student, so that’s where my 25-30% of easy questions come in. I know student S is going to ace the easy and medium questions, but I’m not sure if student S is an “A” or a “B” student, so that’s where my 25-30% of hard questions come in.

scales question distribution

The problem is, when you have a class that has a really wide, and heavily skewed spectrum, you end up doing a lot of “D”/”F” differentiation and not a lot of “A”/”B” differentiation. Which makes your test scores look like crap. And if that’s the case for all of your assignments and assessments, then your class averages look like crap.

So instead of grading in a CRT (criterion-referenced test) system, which looks at raw scores as the basis for grades, I went to a more NRT (norm-referenced test) system, which scales – or in some cases curves – scores to assign grades. This is what large districts like my own do, as well as major national testing companies like the College Board and ACT. There’s a reason they convert raw scores to scale scores. There’s a reason their national averages work out to a bell curve. They recognize and base their scoring on the fact that there is a spectrum of students who take their tests.

scales bell curve

Now, I’m not suggesting that we don’t let students fail. I’m not suggesting what some schools or even entire districts mandate (not mine, thankfully…yet), which are things like having minimum grades of 50% or even 55% rather than zero’s in a gradebook. I see the argument behind that, but…frankly, that’s a topic for a different entry. I’m not suggesting that a student who answers 2 questions right out of 25 end up with an A or a B. However, I am saying that scaling scores is a reasonable way to norm assessment, assignment, and entire class grades so that a teacher doesn’t find him/herself at the end of the quarter with 15 F’s and 10 B’s and C’s and no A’s. I’ve been on the receiving end of that discussion with admin and believe me, it wasn’t pretty. Scaling grades isn’t grade inflation, when it’s done appropriately. Here is how a scaling process might work.

Let’s say, for time’s sake, that my class has 10 students. I know, right? Heaven. But again, this is for time’s purpose, not reality. I give a test to a low-performing class and this is my raw score data:

Student Raw % Score Raw letter grade
A 15 F
B 20 F
C 30 F
D 60 D
E 75 C
F 25 F
G 80 B
H 45 F
I 55 F
J 35 F

Something really important to remember in the scaling process is that I do NOT discard my raw data. This class needs a LOT of remediation and further instruction on this topic. I would NOT want to move on to the next concept before doing some serious re-teaching. The average for this assessment is only a 44%. That is TERRIBLE. Something went horribly wrong.

scales terribly wrong

Here is where I have a decision to make. I can choose to curve my scores OR I can choose to scale them. Curving is different from scaling. Curving is adding points to raw scores, which results in a change to the overall average score. This is probably more common than scaling because lots of teachers already do it. “Oh, man, everyone missed question 7. Gee, now that I look at it, I can see that question 7 really was kind of vague. It really wasn’t a great question. I’m going to throw it out.” When you throw out/don’t count a question, you’re simply adding the same number of points to everyone’s raw score. That’s a curve. Or, when you say, “Hey, this average was only a 44% I want the average to be 70%. That means I have to add 26 percentage points to everyone’s score.” Or, you look at the highest score and if it isn’t an A, you add however many points that person needs to move up to an A to everyone’s score. That’s a curve. That’s how we get the expression “blow the curve.” When everyone else does terribly but one person scores an A, it can negate the need for a curve. Using curves is pretty easy and accomplishes something similar to a scale, which is it saves your grade book and students’ overall averages.

Now, if I wanted to scale these scores, what I would do is decide what I wanted my normed curve to look like, do I want a bell curve? Do I want a skewed curve? If so, do I want to skew low or high? I am pretty anal about this and really love the beauty and symmetry of the bell curve, so I usually scale my scores to resemble a bell curve. You can also choose to scale the scores to some other norm. For example, in my district, the semester exam scales were created to match the norm curve of student course grades. If 15% of students earned an A, 20% earned a B, 45% earned a C, 5% of students earned a D, and 15% of students earned an F, then that is what the scale would reflect. Or try to come close to, anyway.

So let’s scale my quiz scores to match a bell curve. Without going into some higher-level statistics, the basic breakdown of the bell curve says the majority of scores fall in the middle (C), some of the scores fall on either side of that – an equal amount – as B’s and D’s, and the smallest amount of scores fall outside of those – again, an equal amount – as A’s and F’s.

With only 10 students, our bell curve might look something like this:

scales bell curve 2

This is what the scale scores look like from the original table:

Student Raw % Score Raw letter grade Scale score
A 15 F F
B 20 F D
C 30 F C
D 60 D B
E 75 C B
F 25 F D
G 80 B A
H 45 F C
I 55 F C
J 35 F C

Their scale score has nothing (or very little) to do with their raw score and everything to do with the normed curve and their performance compared to their peers. This preserves the spectrum of students in the class without inflating grades or hurting teachers’ grade books.

When scaling is used in conjunction with curving, it can be effective in maintaining a “normed” or “balanced” grade book. If teachers discount raw score information, though, or curve blindly just to get to a “nice average” for an assessment, this isn’t best practice. Without looking at the assessments, teachers aren’t able to become aware of flaws or bias in individual items or student weaknesses and areas in need of intervention. Again, I wouldn’t intend to just move on from this assessment. I would intend to reteach and reassess it until those raw “F” students were able to score raw “C’s”. Am I going to move Student A? Maybe from a raw “F” to a raw “D,” but probably not much more than that. Could I get my raw “C” to a raw “B” and/or my raw “B” to a raw “A”? Probably. I would hope so. But, I don’t want the original assessment to kill my grade book or students’ overall averages, so using a scale is helpful and appropriate. Am I going to achieve my goals every time? Nope. That’s not reality. But the point is that would be my intention, and that’s what matters. I could throw the whole assessment out, but what message am I sending to students then? Did it even matter in their eyes if the grade doesn’t count? Students struggle (often, at least in my experience) to see the value in an assignment that doesn’t get a grade, and then you run into motivation issues. But again, another topic for another entry.

Once I started using curves and scales responsibly, my grade book looked great. I don’t have tons of A’s and I don’t have kids who clearly either didn’t understand a concept or were ridiculously lazy getting A’s, both of which would indicate indiscriminate grade inflation. What I do have, however, is evidence for my administrators that I reflect thoroughly on my grading process and the assessments themselves.

Common Core Practice for RL.8.4, RL.8.5, and RL.8.6

For those of you who read regularly, you’ll remember that I’m working on my 8th grade line of Continuous Improvement Model mini-lesson resources. I’m making good progress and I have recently finished and posted these resources:

CCSS.ELA.RL.8.4

8th grade CIM RL4

CCSS.ELA.RL.8.5

8th grade RL5 1

and

CCSS.ELA.RL.8.6

8th grade CIM RL6 1

I’ve also bundled these so you can save over 10% if you purchase them together.

8th grade CIM RL4-6

If you’ve never heard about or used my CIM resources, they use the research-based “model – teach – assess” technique. They are quick (10-15 min) mini-lessons that target specific standards. The only Common Core practice I’ve been able to find is general and mixed-standards. Mine is the only one I know of that does individual standard, targeted instruction and practice. It’s low-prep and easy to implement.

If you’re looking for quick, targeted, and easy resources for this standards, come check them out!

 

The Continuous Improvement Model

When I started teaching English, I was terrified. I had few ideas about how to effectively communicate what I knew about English and reading comprehension to my students. See, I was (am) really good at English. It was my strongest subject in school. I love to read. I love to write. It comes naturally to me. I rarely struggled with it. So when I encountered students who weren’t good at reading or writing, I didn’t know what to do with them. I had no strategies for how to help them learn and grow. My process for teaching writing evolved into something very successful, but that’s a topic for another entry. My process for teaching reading, however, improved as a result of my being assigned to teach a semester course of Advanced Reading, in which the curriculum embedded something called “FCIM,” which stands for “Florida’s Continuous Improvement Model.” The Continuous Improvement Model – or CIM, for short – operates under the best-practice assumption that education is a cycle of teaching and assessing, and is best communicated through some variation of the research-based strategy “model-guide-practice” or “I do, we do, you do.”

7-17-16a

I wasn’t overly impressed with the quality of the curriculum – as is true for most large curriculum companies, there were errors, lack of explanation for answers, and virtually no guide for teaching metacognition (another research-based instructional component). But I did like, and find effective, the overarching principles of the instructional model. It was on this that I based my own CIM resources.

7-17-16b

These resources are intended to be used in conjunction with other curriculum, instruction, and assessment. They should be a component of the Continuous Improvement Model within any classroom. They should be used to help the teacher gather data on student comprehension and achievement and that should drive further instruction.

This particular CIM targets the Common Core Reading: Literature Standard 1 for grades 9-10. It has a complete teacher script for lesson 1, which models metacognition for students and goes through how to determine the correct answer for the correct reason. It has a guided lesson script for lesson 2, which helps take students through the same process before trying it completely on their own in lesson 3. Although, lesson 3 has thorough explanations for the correct and incorrect answers so that it can be a part of the Continuous Improvement Model process.

 

Additionally, it offers differentiation options by having two complete sets of lessons: one using multiple-choice questions and one using open-response questions. This resource combines 2 essential features of quality education: effective use of time and best-practices teaching methods. Finding supplementary materials that specifically target a single standard are difficult to come by. You either have to take a resource and pick it apart to use only the questions that apply to the standard you’re targeting or you have to make it yourself. And few teachers have the time (or inclination, for that matter) to pick passages and write standards-based questions for them. This takes that prep-work piece away and does that labor for you. However, you, as the teacher, get to decide how to use the resource, how to differentiate and scaffold it, and even whether or not to use pairs or small groups – all while knowing you are engaging your students in the model-guide-practice research-based teaching strategy and gathering information and data you can use as a part of your classroom’s Continuous Improvement Model.

7-17-16f

Although this is my first CIM for this level, I have a complete RL/RI (all 17 standards) CIM set for 6th grade and I’m almost halfway done with the complete 8th grade set as well. Eventually, the intention is to have sets for grades 5, 6, 7, 8, and 9-10.

How can you use this resource effectively? Many teachers who have purchased from my CIM line use this resource as bell work. Others use it as a part of a focus workshop. The product itself has suggestions for differentiation as well as collaborative work.

This line is also highly rated by teachers who have purchased the resource(s). Here are some of the things they’ve had to say about the different products in the CIM line:

On RL.9-10.1:

  • On July 16, 2016, Catherine R. said: This will be a valuable resource to walk students through the process of reading and using evidence to respond to questions. Thanks for sharing.

On RL.8.2:

  • On July 13, 2016, Wacky Apple (TpT Seller) said: Exactly what I needed.

On RL.6 (various):

  • On March 29, 2016, Megan F. said: One of the best resources I have seen in a while
  • On June 7, 2016, Rebecca Harris (TpT Seller) said: Excellent resource! I look forward to using these with my students this year!
  • On April 21, 2016, Janine L. said: Looking forward to using these to review the standards my students performed poorly on for their district assessments.
  • On January 24, 2016, Crystal V. said: I own the RL bundle and the RI bundle. Yes, they are a little bit pricey but I think it is absolutely fair considering the amount of content. You definitely get what you pay for! I love using these as Bell Ringer Activities, and to remediate standards since testing is coming up. Thanks!
  • On November 20, 2015, Shakera W. said: Thank you! this resource is so helpful for intervention! Thank you so much! Very thorough resources!
  • On January 20, 2016, Emily S. said: Great resource! Such a time-saver! Thanks 🙂
  • On October 14, 2015, Shakera W. said: Great intervention tool! Thank you!

If this sounds intriguing, head on over and check it out!

How I Use Gains (or Growth) Grading to Foster Confidence in Low-Performing Students

I don’t know about you, but every year I’ve taught, I have students who come to me so defeated that they think the year’s over before it even starts. They’re so far behind in whatever subject it is (Math, English, whatever) that they don’t even want to try because they are convinced they’ll fail. I can’t blame them. If I knew that no matter what I did I was going to see “F” after “F” on every paper I got back, I wouldn’t want to put forth much effort, either. These were the grades they were getting (and would continue to get) on assignments if I graded on accuracy. The problem was that I was teaching and assessing on grade level, even though so many students were woefully below it. My hands were tied somewhat in terms of curriculum. I could do some remediation, but there just wasn’t ever going to be enough time in the day (or year) to make up for year upon year of missing skills. But I had to find a way to motivate them, so I came up with (okay, I’m sure I didn’t come up with it, but I decided to use) growth grading. I gave my students benchmark tests throughout the year (usually once a quarter – sometimes more, sometimes less, depending) and explained how they would be graded.

Baseline test: Graded on effort. I would watch the students as they worked and they would earn an “A” if they were working at 100% the entire time. Grades would go down if they weren’t trying. I would record the raw score of each student, but that wouldn’t factor into the grade itself.

Benchmark test #1: This test was graded on growth. I compared the student’s baseline raw score to this benchmark test’s raw score, and if they improved they would earn an “A” or “B” – the amount of improvement would determine which grade they earned. If their raw scores stayed the same (or within a certain percentage of each other) they would earn a “C”. If their score went down or it was obvious they just weren’t trying, they would earn a “D” or “F,” depending on the severity of the decrease/lack of effort.

Subsequent benchmark tests: These were graded the same way. However, if I had students who were earning raw scores in the “A” range, then if they stayed there, they continued to earn an “A”. I wasn’t going to penalize a student for going from a 95% to a 92%.

Students were so motivated once they learned this! The excitement in their eyes and body language when they saw an “A” on their test was inspiring. Even if they went from a 10% to a 20%, they had earned an “A” because they were making progress. It changed their outlook on their work and education.

Of course, I couldn’t grade every assignment like this, but because of this I also began writing raw scores on student work instead of percentages or letter grades. I found that students who were habitually getting “F” grades would get less discouraged if they saw 3/6 instead of “F” or 50%. For some reason, knowing they got half right was less discouraging than seeing that “F” or percentage. Sometimes, if I noticed a student was making progress, I would note it on their test. Just the other day, I had a student who had worked harder than he had the whole year for a full week. He took his notes, did his class work, paid attention…really tried. On his quiz, he only got 2.5/6, but I went back through his other scores for the year and it was the highest grade he’d earned the entire school year. So rather than put “F” on his paper, I wrote him a note: “This is the highest quiz grade you’ve earned all year! Why? Because you did your work and put in the effort! So impressed!” During the next quiz, he seemed discouraged, so I reminded him how much his hard work had paid off and that perhaps his goal this time should just be to get a 3/6 to show improvement. I could tell that changed his attitude and he worked harder on it than he had been.

Give growth grading a try and see if it changes your students’ motivation and self-concept.

Common Core Practice with Non-fiction in the Middle School Classroom

It’s hard to get kids on board with reading non-fiction. Fortunately, there are websites out there with kid-friendly articles that are topical and approachable. It was these websites I went to when I needed to find non-fiction, informational text to create my Continuous Improvement Model mini-lessons for CCSS.ELA.RI.6.7.

CCSS RI6.7

This CIM has 3 different mini-lessons designed to take about 10 minutes each. Every lesson has 2 questions to help students practice this standard. I was able to find appropriate non-fiction, informational text on www.timeforkids.com and also on http://kids.nationalgeographic.com.

timeforkids   education nat geo

What’s great about this resource is that it ONLY targets the standard RI 7 (Reading Informational Text 7). All the other practice out there that I have seen has multiple standards, which is great if you’re doing summative work or have lots of time to go over every single standard with the students, but if you are wanting practice that pinpoints a single standard, mine is the only thing I’ve found like it. Mine also has scaffolded practice, which is the only resource I’ve ever seen with that as well. Other practice out there tells which answers are right and why, but mine has 3 separate lessons that follow the “I do,” “we do,” “you do” model so students are scaffolded on their journey to mastering the standard. It even includes a script for teachers for the first and second lesson, along with detailed explanations on answers for the third lesson, in case the teacher needs to re-teach/explain.

6th grade CIM RI7 c  6th grade CIM RI7 b   6th grade CIM RI7 d

This CIM is targeted for RI6.7, but because of how Florida redid their standards, it is also applicable for LAFS.6.RI.3.7.

If this sounds like something you could use with your 6th graders, you can check out the free version (RL.6.1) and then go from there.

Otherwise, you can go straight to the CIM for RI.6.7 here.

6th grade CIM RI7 a

Common Core Practice RI.6.2

When I started working with teachers two years ago upon CCSS roll-out in our state (which was slapped, tickled, and relabeled LAFS – Language Arts Florida Standards), it became clear there was going to be a void. Teachers I worked with and spoke to became nervous that there weren’t enough practice materials for students to master the new and more rigorous standards. I began creating resources for them, such as question stems, and they gave great feedback. As time went on, though, and the new state tests loomed on the horizon, I began to see that students in my district weren’t going to get enough test preparation. This was a function of the curriculum our district uses in English/Language Arts. I won’t elaborate beyond saying that it is completely void of any real, consistent, useful traditional assessments and how the curriculum potentially relates to what the students will see and be expected to demonstrate mastery on when they take the state assessment. I figured that if this was the case in my district, it was probably true in others. Based on this assumption, I began creating my CIM line. CIM stands for Continuous Improvement Model. It is based on the “I do,” “we do,” “you do” model of instruction. Students get three rounds of practice – once at the teacher level, seeing the metacognition that goes on during the problem-solving process; then in a guided setting, where the teacher can begin to see the areas needing focus and re-teaching; and finally, independently, demonstrating mastery or lack thereof. For those of you not familiar with the term CIM, I didn’t invent it. It comes from the reading curriculum our district uses, except we call them FCIMs – the “F” standing for (no, not that F word) Florida. Since the LAFS correlate pretty much identically with the ELA Common Core Standards, I just dropped the “F” and my CIMs are designed to help any student in a state with either Common Core or LAFS – or a state who did the same thing Florida did and just put a brand new coat of paint and called it a horse of a different color.

6th grade CIM RI2 d

I’ve gotten good feedback from these lessons, so I’ve continued my quest to fill the void. I’ve done almost all of 6th grade Reading Literature standards CIMs (putting the finishing touches on RL.6.7…it should be ready by next Sunday) and have started on the 6th grade Reading Informational Text standards CIMs. That’s what this post highlights: RI.6.2 (central idea and summarizing). This resource has three lessons. Lesson 1 is a scripted teacher lesson that presents questions and a passage, along with commentary and reasoning for students to hear the process aloud to see how the teacher arrived at the answers. Lesson 2 is a guided practice lesson where the teacher helps students reason and analyze their way to the correct answers with a little help here and there. Lesson 3 is an independent lesson where students must demonstrate that they can come to the correct answers on their own. The results of Lesson 3 dictate either re-teaching or moving onto the next concept. These resources aren’t units and they aren’t meant to be stand-alone products. They’re designed to be more like bell work or class starters. They’re only supposed to take about 7-10 minutes each. My plan is to have a full line of RL and RI CIMS for grades 3-11 (I’m not sure 12th grade would have much of a demand, since most states stop testing in either 10th or 11th grade), so you’ll periodically see blog posts from me about the newest CIM I’ve added to the line.

6th grade CIM RI2 b

6th grade CIM RI2 c

I have a free version you can test out if you think this might be something your students could use. The good news is that if you’re at all familiar with Common Core or LAFS you’ll know that the secondary standards are remarkably similar, so if you have some students who aren’t quite reading on grade level, the 6th grade series might be a good place to start to build some confidence. Of course, if you use it and you like it, leave some feedback and rate the product. It helps me reach my goals and improve my products. Thanks!

6th grade CIM RI2 a

Summer BOGO #3!

Have you been waiting to see what my newest summer BOGO will be? Well, the wait is over! For this week (from today through Saturday, 6/20/15), I have decided to offer a project BOGO. If you purchase my best-selling and highly-rated Probability Project, you will receive my Scale Model of the Solar System Project for free!

June BOGO 3

Both these projects are fabulous ways to assess your students’ understanding in a hands-on, authentic way. In the Probability Project (which is designed to take place at the END of a probability unit of study), students create their own carnival-style games, predict outcomes, play the games, record data, and analyze the data they’ve collected.

prob proj cover

In the Solar System Project (which is designed to take place at the END of a unit teaching scale and scientific notation), students research the planet sizes and distances from the sun in our solar system. Then, they create a scale model of the solar system and discuss (through writing) their processes.

scale model solar system

My students LOVED both of these projects, and I’m sure yours will, too. They are aligned to CCSS (Math), but would apply to any state’s standards regarding probability, scale, and scientific notation.

Depending on your standards and curriculum, these projects would be appropriate for students in grades 6-9.

I hope you’re having a great summer!

Follow this link to the BOGO offer!

Are You Testing Me? Part 3

As a classroom teacher, I was always looking for ways to effectively assess my students’ learning. I came up with some great ways to differentiate through product, but sometimes, I just had to use a traditional assessment. I always thought I was pretty good at creating those assessments, but once I got my current job working for Assessment and Accountability, I realized I’d been doing lots of things that are not best practices when it comes to traditional assessments. I’ve decided to share some biggies with you in the hopes that your classroom assessments can be more valid, effective, and help you inform your instruction.

Be cognizant of item complexity, difficulty, and the distribution. We want assessments to tell us about our students. If there is a test that is so easy everyone gets 90%+, that doesn’t discriminate well at all. We don’t learn anything about what needs to be retaught or extended. If a test is so difficult that the average is 30%, that doesn’t discriminate well, either. We still don’t learn anything. It’s important that there are a variety of items in terms of complexity and difficulty. Think about it this way: you want to have some items that will tell you the difference between your D and F students. Yes, every A/B student will get those items right, but that’s okay; you’ve put them on the assessment to tell you about the struggling students’ mastery. Then, you have items that will tell you the difference between your A and B students. yes, every D/F student will get those items wrong, but that’s okay; you’ve put them on the assessment to tell you about the high-performing students’ mastery. Now, difficulty and complexity are NOT the same thing. Difficulty is how many students answer the question correctly (high correct % = easy question; low correct % = challenging/hard question). Complexity is based on the Depth of Knowledge (Webb) or Taxonomy (Bloom). You can have a high complexity question that is easy and you can have a low complexity question that is hard. You won’t know the official difficulty level of a question until students take the assessment. You can speculate on the difficulty level, but that is performance based. Complexity, however, is NOT based on performance. A simple recall question is low complexity, regardless of how students perform on the item. You need to have a balance of both. When you write your questions, try to keep these ratios in mind: 10-15% easy, 15-20% difficult, the rest should be average. 10-15% low complexity, 15-20% high complexity, the rest should be moderate. So, for a 50-item test, the breakdown might look like this: 7 low complexity questions, 10 high complexity questions, and 33 moderate complexity questions. When planning/writing I might anticipate a difficulty breakdown of 5 easy questions, 10 difficult questions, and 35 average questions. Now, when my students take the test, I might find that the actual difficulty breakdown looks like this: 2 easy questions, 40 difficult questions, and 8 average questions. I know now that my test was too difficult. It’s not telling me anything useful. I really should make a new test. OR it might look like this: 30 easy questions, 5 difficult questions, and 15 average questions. I know that my test was too easy. It’s not useful. I really should make a new test. The more questions you write, the more information you get. That being said, sometimes quizzes will only have 5-10 questions, so do the best you can with what you have to work with.

Sometimes, you won’t have the time or ability to make an entirely new assessment if your first one didn’t perform as you’d hoped. In that case, it may be time to employ a scale (or curve). Here, there are some important things to take into consideration. The first is the overall grade distribution of your class. If you are a consistent grader, then your students’ performance on any given assessment should be similar to their performance in your class in general. Essentially, you’d expect an A/B student to get an A/B score on any given assessment. In general. So you can use that distribution to scale your assessment score. If your class breakdown has 4 A’s, 7 B’s, 11 C’s, 6 D’s, and 3 F’s, then you can scale the assessment to be close to that breakdown. That DOESN’T mean that you give the 4 A kids an A on the assessment. They might have performed really poorly and end up with one of the lower grades. What matters is the distribution. Don’t look at names. Here’s what your scaling process might look like:

Grade breakdown in class Grade breakdown on assessment (based on traditional 90-80-70 scale) Raw score on assessment Scale score on assessment
13% A (4 ST) 3% A (1 ST – 44) 45-50 40-50 (4 ST = 13%)
23% B (7 ST) 10% B (3 ST – 40, 41, 44) 40-44 34-39 (6 ST = 19%)
36% C (11 ST) 16% C (5 ST – 35, 35, 36, 38, 39) 35-39 29-33 (11 ST = 36%)
19% D (6 ST) 32% D (10 ST – 30, 30, 31, 31, 32, 32, 33, 33, 33, 34) 30-34 20-28 (5 ST = 16%)
10% F (3 ST) 39% F (12 ST – 29, 29, 28, 28, 27, 26, 20, 19, 19, 18, 17, 15) <30 <20 (5 ST = 16%)

As you can see, the distribution of grades on the assessment is now similar to that of the class as a whole. You could also do it by looking at the overall distribution of grades for ALL your classes (for that course). Now, the original average for the exam was about a 61%. It is difficult to calculate a new assessment % based on scale grades, because there are no raw scores from which to work with. As a teacher, you’d have to come up with the % you’d want to assign for an “A”, “B”, “C”, etc. You could also break down the scale even further to include “+” and “-“ grades if you wanted to get more specific.

Another option is to go through the assessment and look at items that performed below a certain threshold (perhaps 25%, since that’s the guessing rate) and throw them out. Recalculate the grades and then go from there.

Another option is a flat curve, which is where you add a set amount of percentage points to everyone’s scores. This does NOT work well when one or two students performed very well but everyone else tanked. One way you can do this method is to look at your highest score and see how many points it would take to bring that score up to a certain threshold. For example, on this assessment, only 1 student earned an A (>90%). However, that raw score might have only been a 45/50. So as the teacher, I might say, “I’d like the highest score on the assessment to be a 98%. That would be a 49/50. So I’m going to a) add 4 raw score points to everyone’s score or b) add 8% points to everyone’s score.” In this scenario, the distribution of assessment scores won’t relate to your class grades, but it will raise the average. You can see that the scores are not similar to the overall grades in the class, but the overall average for the test is higher (8% higher, actually). The original class average was about 61%. Now the class average is about 68%. Much closer to that “C” average.

Here’s how that would look:

Grade breakdown in class Grade breakdown on assessment (based on traditional 90-80-70 scale) Raw score on assessment Flat curve score on assessment
13% A (4 ST) 3% A (1 ST – 45) 45-50 +4 10% A (3 ST)
23% B (7 ST) 10% B (3 ST – 40, 41, 44) 40-44 +4 13% (4 ST)
36% C (11 ST) 16% C (5 ST – 35, 35, 36, 38, 39) 35-39 +4 32% (10ST)
19% D (6 ST) 32% D (10 ST – 30, 30, 31, 31, 32, 32, 33, 33, 33, 34) 30-34 +4 29% (9 ST)
10% F (3 ST) 39% F (12 ST – 29, 29, 28, 28, 27, 26, 20, 19, 19, 18, 17, 15) <30 +4 19% (6 ST)

Yet another option is to scale the assessment so that it follows a normal distribution curve. In this case, you would want to end up with a roughly equal (but small) number of both A’s and F’s, a roughly (but slightly larger) number of B’s and D’s, and then the majority of scores would be C’s. For this particular class with 31 students, I would anticipate my normal distribution curve to look something like this: 2 A’s, 4 B’s, 19 C’s, 4 D’s, 2 F’s. There’s a little room for playing around. I might want it to be 3 A’s, 5 B’s, 17 C’s, 4 D’s, 2 F’s. That’s pretty close, too. I then adjust the grades accordingly. I would list the students’ grades from highest to lowest and the top 3 would be A’s, the next 5 B’s, the next 17 C’s, the next 4 D’s, and the last 2 F’s.

Keep in mind that these methods of scaling/curving/norming individual assessments can be done for overall class grades, too. This is useful if you have a particularly low-achieving class but won’t be looked favorably upon if you have 20 failures at report card time.

It’s important to realize that if you curve/scale/norm your assessments, that doesn’t make you a bad teacher. You can still get information about your students’ mastery and use it to inform your instruction without punishing the students for faulty questions or a test that you simply made too difficult. The important thing is that if you realize your tests are too difficult, make an effort to change it. Either change what/how you teach or change the anticipated difficulty of your tests. Think about why you are assessing students.

Final Thoughts: Think about the purpose of assessment. Any assignment, really. We plan, we teach, we assess, and continue the cycle until our students master what we’re responsible for teaching them (or the end of the year gets here…whichever comes first…usually the end of the year). Assigning something or testing students with the sole goal of “teaching them a lesson” or intentionally promoting failure doesn’t fit into that cycle (plan/teach/assess). Assessment sometimes gets a bad rep, but if it truly fits into the cycle, it shouldn’t. Assessment (testing) is a part of the education cycle. If we don’t figure out what kids know, how can we teach them appropriately? And keep in mind all assessment doesn’t have to be summative or cumulative. We can design, give, and use interim (ongoing, formative, whatever you want to call it) assessment to make micro-cycles of the plan/teach/assess loop. Teachers do it all the time without realizing it. Every question asked is an assessment, whether it’s during a discussion or on a paper/pencil test. Making traditional assessment work for you and your students is just one piece of the puzzle. I hope this series has helped with that.

Are You Testing Me? Part 2

As a classroom teacher, I was always looking for ways to effectively assess my students’ learning. I came up with some great ways to differentiate through product, but sometimes, I just had to use a traditional assessment. I always thought I was pretty good at creating those assessments, but once I got my current job working for Assessment and Accountability, I realized I’d been doing lots of things that are not best practices when it comes to traditional assessments. I’ve decided to share some biggies with you in the hopes that your classroom assessments can be more valid, effective, and help you inform your instruction.

  1. Only ask one question per item (for multiple-choice items; obviously, extended response items can ask multiple questions). If you need to break it apart into two items, then do that. For example, “Who was the first President of the United States and when was he elected?” is inappropriate because it asks two things in one item.
  2. Do NOT use “all of the above” as an answer choice. This decreases the discrimination value of the item because it essentially turns the question into a 3-choice question instead of a 4 or 5-choice question. You can use “none of the above,” but use it sparingly. Additionally, use questions with “not” and “except” sparingly. These tend to confuse students and you don’t really learn if they misunderstand the concept or your wording of the question. And when you do use “not” or “except” in a question, you should capitalize and bold it so it stands out to students. (Side note: you can capitalize and bold words like “most,” “least,” and “best” if you like.)
  3. Keep in mind that your students take roughly double or triple the amount of time to answer a question as you do. For example, if you create a quiz and take it yourself and it took you 10 minutes, it will take your students between 20-30 minutes. Obviously, if you have students with special needs (ESE, ESOL, etc.), this will increase the time even more. Have reasonable, developmentally appropriate expectations for your students.
  4. Be prepared to throw out poorly performing items. If more than half your students get a question wrong, chances are, there is an issue with the item. That being said, if more than half your items end up being missed by more than half your students, that’s a pretty stellar indication that a) they did not comprehend the material at mastery level and/or b) the questions had major problems in construction. Your options: throw the test out or curve it (and by “curve” I mean use a scale to norm the distribution of grades). Creating a bad test doesn’t make you a bad teacher, it just means you need more practice and guidance in test creation.
  5. Base your items on instructional objectives and/or standards. Don’t test trivial knowledge. Make sure you are assessing things that matter in terms of the scope and sequence of your course. The BEST option is to create the assessment BEFORE you teach the lesson/unit/concept. Begin with the end in mind. Your assessment is like a blueprint for your lesson. It’s the objectives for what you’re teaching. You want the students to master X, Y, Z, so you create an assessment to measure their mastery of X, Y, Z, and then you teach X, Y, Z. Finally, you assess the students with the assessment. It’s not “teaching to the test,” it’s called “backwards design” and it’s a best practice.

Making Poetry Meaningful

I am a poetry snob. Perhaps one could call me uncultured, but I am very picky about the poetry I choose to read. To be perfectly honest, I really only enjoy two poets: Edgar Allen Poe, and myself. Well, in general. There are isolated poems from various poets that I enjoy. “The Charge of the Light Brigade” (Tennyson), “Because I Could Not Stop for Death” (Dickenson), and a few others. So suffice it to say that I empathized with my students when it came time for our poetry unit each year. Writing poetry comes easily to me, but that’s not the case for most students. In fact, in my years of teaching English, I found that the predominant reason for my students disliking poetry was that they struggled with writing it. Also, they’d had countless other teachers who had made them memorize and recite poems (I won’t editorialize on that practice…the fact that I NEVER made my students memorize and recite poetry in my class should speak for itself), which mostly served to turn them off of poetry completely.

I found that they also struggled with identifying different poetic devices within poems and analyzing poetry for any sort of meaning. Often, this was due to the fact that the poets they were forced to read were dry and confusing. What if my students were just like me? What if the poet they would enjoy reading the most were themselves? How could I get my students to write meaningful poems when they didn’t know how to write poetry?

First, I started small. I had to disabuse my students of the notion that all poetry had to rhyme or have some set rhythm and meter. That was difficult. Once I got that through their heads, though, I would show them examples of acrostic poems. I always had my students begin with an acrostic poem of their name. Eventually, the majority of my students were able to construct a reasonable attempt at an acrostic poem.

To help them understand rhyme and rhythm (and meter) I would use songs/song lyrics. This is also how I taught the poetic device: refrain. The jump from song to poem was easy with the “Battle Hymn of the Republic.” I found (in my biased opinion, of course) that the best poems for teaching meter/rhythm were Poe’s poems. My two favorites are “Annabel Lee” and “The Raven.” Incidentally, when my own middle school teacher required that we choose a poem to memorize and recite, I chose “The Raven” simply to be obnoxious. It was the longest one I thought I could actually memorize. I got about 3 stanzas in before she forced me to stop and sit down. Yeah, I was “that kid.”

Occasionally, I would give the traditional quiz on the various poetic devices, but I never gave an end-of-unit test. Instead, I wanted what they’d learned to be meaningful to them. From the onset of the unit, I let them know that they would be creating their very own poetry product: a calendar. I would usually have it be the full next calendar year, January-December, although, that might change based on the time of year during which the poetry unit was taught.

I picked 11 of the most important (to me, as the teacher; the most prolific, common, whatever you want to call them) poetic devices we studied during the unit. Then, I told students that they would be responsible for writing their own poems for each month: one poem per month; one device per poem. The 12th month/poem was a free choice for them. I encouraged them to relate the poem to the month, season, or event that occurred during the month. I strongly suggested they think about personal connections to months: birthdays, family trips/vacations, etc. Of course I gave them the major holidays (New Year’s Day, Valentine’s Day, St. Patrick’s Day, Easter, Mother’s Day, Memorial Day, Father’s Day, 4th of July, Labor Day, Halloween, Thanksgiving, and Christmas), but also suggested things like winter break, spring break, exams, summer break, back to school, etc.

I printed out the calendars for them and gave them free reign to decorate them according to their poems (if it focused on a birthday, then they would label that day on the calendar and illustrate it appropriately; if it was Valentine’s Day, same thing). They left ample space for their poems on each month.

I found that the vast majority of my students responded well to this authentic task. I had lots of students who would decide to give the calendar as a gift to a parent or relative (depending on the time of year you can suggest it as a Mother’s Day gift or a holiday gift for a parent/relative). Many of them expressed their excitement that the calendar would be hanging somewhere (on the fridge, in a parent/relative’s office, etc.), and they put in quite a bit of effort. It became meaningful to them. They suddenly were able to write poetry because they had a purpose and context. Over the years I got some very poignant, touching work (Veteran’s Day poems, poems to grandparents or others who had died, etc.). Not all of the poetry was going to win awards, but the students took the assignment seriously and did a reasonably good job. I was easily able to tell who had mastered the various poetic devices and other literary concepts.

The best part about this assignment was that it could be tailored to any level. I started with it in 7th grade and used it all the way up through my high school students. I had poems as simple as limericks and haikus all the way up to full-blown Shakespearean sonnets. And not only did the kids (many of them, anyway) actually enjoy writing the poems, but they (dare I say “all”?) enjoyed coloring and illustrating the months in the calendar. And I was able to photocopy the best ones to keep and display in my room that year and use as examples in subsequent years.

Creating the calendar (in its entirety – writing the poems, illustrating the pages, etc.) usually took about 2 weeks. Sometimes less, depending on the students, but never more. With my honors kids, I would often assign a certain amount as homework so it didn’t stretch too far into class time. I found that the kids needed my help, though, in many cases, so I shied away from assigning the whole (or majority of) thing as homework.

If this sounds like something you’d like to do in your classroom, you are in luck. I’ve already created the templates and rubrics for the project. All you have to do is decide which poetic devices you want to assess.

Click here for the Poetry Calendar Project.