I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Tuesday, March 12, 2013

Why SGPs CAN'T Evaluate Teachers

If you're a teacher or a parent in New Jersey, get used to this acronym: SGP.
One prevailing question both during and after the meeting had to do with how the process would work, especially the tests scores and a complex formula called “student growth percentiles” (SGP) that will measures students’ progress against that of comparable peers.
The state is expected to release its first SGP scores for overall school performance this month, based on 2011-2012 tests, but has only started to share them for individual teachers. [emphasis mine]
Here's the key to understanding SGPs: as their "inventor," Damian Betebenner says, SGPs are a descriptive measure. They measure student "growth," relative to other students, but SGPs make no attempt to determine why a student "grows" a particular amount.

For this reason, SGPs are an inappropriate tool to use for teacher evaluation: they make no attempt to find the cause of a student's growth, placing all of the responsibility on the teacher.

Let's look at a hypothetical example to see how this works. Everyone, I'd like you to meet Jenny from New Jersey:

Jenny lives in Clark, a town that's not the richest in the state, but not the poorest either; that's why the town gets an "FG" for its District Factor Group (DFG), just about in the middle.

Every spring, Jenny and her classmates miss a week of actually learning stuff and instead take the NJASK, a bubble test that no one outside of the NJ Department of Education's orbit is allowed to vet. Jenny has noticed that her teacher and her principal are extremely worried this year about the results of the test. What she doesn't understand - yet - is that 2013 will be the first year that her performance on the NJASK will affect her teacher's and her principal's evaluations. Every score that Jenny gets on the NJASK will have consequences for the adults who have made the commitment to educate her.

Let's see how. Here's how Jenny did on her NJASK-3 math section (we could use language arts in this example, or a combination of the two, but let's keep it simple):

Jenny got a 225, which means she is deemed "Proficient"; good job, Jenny!

But this year, her score takes on a new meaning. Jenny will now be matched with her "academic peers": the students throughout the state who received the same score (or close to it) that she did. Let's pick out a few of her "peers" to see who they are.

Nicky is from Irvington, an "A" district, which means there is a lot of poverty at Nicky's school. Connor, however, is in a "J" district, the most socio-eocnomically elite classification. Susie lives in a town a lot like Jenny's; so does Parth. But, as we'll see in a minute, the children have very different lives - differences that will affect their test scores.

However, all five kids got a "225" on the NJASK-3, so they are "academic peers" going into Fourth Grade - even though their districts, their schools, their families, their communities, and their lives are quite different.

But Governor Christie, Education Commissioner Cerf, and the NJ Department of Education do not care! All that matters to them is that these children got the same score once on the same test.

You see, the children's Fourth Grade teachers will be judged based on their students' Student Growth Percentiles, or SGPs. The NJDOE recommends that 35% of a teachers' evaluations be based on the median SGP for a class (which is problematic on its own - more later). The difference between these children's Third and Fourth Grade NJASK scores will have profound implications for their teachers.

So all five of these kids - and all of their "academic peers" - enter Fourth Grade. In the spring, they take the NJASK-4. Here are their results:

Hmm... it looks like some of the kids made more "growth" than others. Let's order them by their "growth," going left to right:

The differences between the kids are clear: Connor made the most "growth," and Susie made the least. Most of us would wonder why that is; what happened in the lives of these children that leads to differing scores? Was it the teacher? The classroom peers? Home life?

Christie, Cerf, and the NJDOE, however, do not care! All that matters to them is that, relative to kids like Jenny, Connor got a big score gain, while Susie went backwards.

Had they bothered to investigate, they might have learned that Connor's parents had been concerned that their son was under-performing. They enrolled him in a pricey enrichment summer camp the year after Third Grade, where Connor got extra help in academics. They hired a tutor for a hefty fee to work with Connor twice a week; since they live in Bernards, a town with the highest DFG, they had many well-off neighbors to turn to for recommendations. Connor had a fine Fourth Grade teacher, but much of his progress had nothing to do with that teacher.

Parth's score worries his teacher, because he is one of the strongest math students she's ever had; he flies by the other kids, and aces her own assessments. But Parth gets nervous very easily, and hasn't done well on the NJASK, relative to his abilities, for two years. He is underperforming on standardized tests - but that has nothing to do with his teacher.

Nicky lives in poverty, has a very unstable home life, and she has a Specific Learning Disability (SLD). Fortunately, she has a great teacher, who demanded that Nicky be classified and receive an Individualized Education Plan (IEP). It took most of the year to get that together, but Nicky is now getting special instruction to help her with her dyslexia. She has made great progress, thanks to her teacher, but you can't see that in her SGP.

Susie, on the other hand, is still struggling to learn English. Her family speaks Mandarin at home, and even though she is very bright, she has followed the typical pattern of second language acquisition for young children: she is able to have simple, direct conversations in English, but she hasn't yet learned to think in higher-order ways in her second language. Since the NJASK-4 math section asks more complex word problems, she took a tumble in her scores. She's making good progress; her ESL teacher is confident she will bounce back next year. But her struggles have nothing to do with her teacher.

So every one of these children has a story; every child's growth is due to a combination of factors. But Christie, Cerf, and the NJDOE do not care!

Susie took a dive in her scores, relative to Jenny, and that's all that NJDOE cares about. Her teacher and her principal will pay a price, even if her lack of "growth" has nothing to do with the quality of her school, her principal, or her teacher.

Our reformy overlords at NJDOE, however, are not without their mercies: they understand that it wouldn't be fair to judge Jenny's "growth" against her old 3rd Grade "peers." It's time to bring in a new batch of "peers": students who have the same two-year history as Jenny:
This is one of the core features of SGPs: Jenny's "peers" are based solely on their test score histories. So now Jenny has a new group of "peers" from very different districts, very different family backgrounds, and very different communities. 

But Christie, Cerf, and the NJDOE do not care! These children are test score "peers," which means they will be judged against each other on their performance on one test during one week out of the entire year. So how did they do in Fifth Grade? Who's got the best "growth"?

Uh-oh - looks like some kids "grew" more than others again (golly, what are the odds)! Let's rank order this group:

Here's a funny conincidence: Brad's and Connor's fathers work at the same brokerage firm! And when Brad's dad found out how well Connor did after a summer at that pricey enrichment camp, he knew he had to sign his son up right away!

(Yes, I'm being a little facetious - but I think you get the point, don't you? OK, let's keep going...)

Brad's teacher is so-so, but that's OK, because Brad's growth was really due to many factors other than his teacher.

Julio, like Susie, doesn't speak English at home. But this was a breakthrough year for Julio, and his language skills have blossomed impressively. His ESL teacher knew it was a matter of time - you can't rush these things - and now Julio looks to be moving on to full mastery of the English language. Julio goes to a great school with a great team of teachers, but his growth was as much about his personal characteristics as it was about the staff at his school.

Steve, believe it or not, is exceptionally bright; that he has done as well as he's done, even though he's been bounced from one extended family member to another, is a testament to his "grit." But he had just moved back into his grandmother's house and didn't get a good night's sleep all week during the test. He's bright, he's a hard worker, and if he can just get a stable home life, he's sure to go far. This was just a bad patch that had very little to do with his teacher, who is supportive and nurturing and a great role model for Steve.

Angela, it turns out, had a terrible teacher this year!* He's a 35-year veteran who is burned out and needs to be removed - and the principal knows it. Both the principal and the superintendent are working together to make sure this teacher moves on, and since they have good relations with their local's union president, they are confident they will be able to counsel this man out. But if they can't, they'll use the new tenure law - proposed and supported by the NJEA - which will make it quick and efficient to build a case for this teacher's removal.

So Angela's test score drop, unlike Steve's, is largely attributable to her teacher. But student SGPs  treat teachers with different levels of "effectiveness" as the same! There is no attempt to tease out how much of either child's drop in scores is due to teacher effect.

So there are reasons for all of these scores - and some of the reasons are, in fact, teacher effectiveness. But you know what? Christie, Cerf, and the NJDOE do not care! All that matters is that Steve and Angela fell backward, relative to Jenny, while Julio and Brad forged ahead.

If the principals of these children's teachers were to get together, they'd agree that Brad and Angela's teachers were not "efficient." But the SGPs don't show that; nor do they show that Steve has a great teacher, and that Julio has a great team of teachers (and his principal has no idea which member of that team he should attribute Julio's success to - but that's yet another matter...).

So now we see the problem with SGPs: they are merely descriptive measures that cannot account for teacher "effectiveness". They don't even make an attempt to ascertain why a child "grows" or not; they merely say whether a child "grew" relative to other children.

Which brings us back to Jenny. What has any of this testing and SGP-based teacher evaluation done for her? She's making progress year-to-year. She's consistently "growing" as measured by the NJASK, but those scores tell us nothing about how her teacher, her principal, or her school affects her learning. Yet we are preparing to radically change teacher evaluation, using SGPs - an inappropriate tool - in a method that has never been properly tested. Why? What's the point? What is going to be gained for Jenny, or any of these children?

Do Christie, Cerf, and the NJDOE care enough to give us an answer?

* Let me give this shout-out to all the great teachers in Cranford, especially the vets! I've been in your schools, I've met your kids, and let me tell you, you have a great district with great teachers. I'm just trying to make a point; this is a purely hypothetical example. Go Cougars!


Unknown said...

And to take it to the next (il-) logical step, how does the breakdown of teacher effectiveness rating work? SGP and observation of classroom practice are apples and orangutans.

How do we calculate 20 students' SGP scores as 35% (35% of...what?), classroom observations as 50%, and other testing measures as 15% of...what, a "4", which is a "Highly Effective" rating? What's the math that calculates the SGP scores of 20 students to represent 35% of a "4", and the 1 through 4 rating in each of four Domains as 50% of a "4"? How is that calculated? How DO those scores become 35%, and 35% of what???!!

This seems more and more like "what's the difference between a duck," "is it closer to New York or by car," or the sound of one hand clapping.

I take great pride in my job and I take it seriously. I understand its importance and the impact it has on my students' lives. I have no argument with being held accountable for that impact. However, this seriously flawed and rushed evaluation model does not--and cannot--measure that impact with any accuracy whatsoever. It has no ability to inform my professional growth and therefore, no chance of improving my students' achievement.

It might be a somewhat interesting thought experiment, funny, or sad... if my career, livelihood, and my home didn't depend on it. Since they do, it's terrifying and dangerous.


edlharris said...

Thanks for the great explanation. People from Jersey are great (my mom grew up in Somerville).
Down in Maryland, it's a little hard to tell what we will be using. The LEAs are to develop their own, using state guidance, with the test score component having to be atleast 20%. Why 20%? The fellow from the state could not provide me with any research supporting 20% as the optimal determinant for teacher quality.
The state's explanation is here (http://msde.state.md.us/tpe/TPEG/Chapter12.pdf) and this is what LEA will use if they can't come to agreement within the individual counties.