Jersey Jazzman: Teacher Evaluation Without Research Backing: SGOs

Thursday, November 7, 2013

Teacher Evaluation Without Research Backing: SGOs

Tomorrow, NJ Education Commissioner Chris Cerf will address the membership at the NJEA's teachers convention. He is supposed to be taking questions.

Perhaps someone will ask him why we NJ teachers are wasting our time with these ridiculous Student Growth Objectives or SGOs (also known as Student Learning Objectives or SLOs). Because there is no evidence that they work:

There is little evidence on the statistical properties of SLOs. Most of the limited evidence on the statistical properties of SLOs simply reports the proportion of teachers achieving SLO objectives (see table C1). Across various sites the results consistently show that most teachers achieve some or all of their SLO targets. In Denver, early in the implementation of the district’s professional development and compensation program, ProComp, 89–98␣percent of participating teachers met at least one SLO (Community Training and Assistance Center, 2004). A few years later, a large majority of teachers continued to meet SLOs, with 70–85␣percent of participating teachers earning a
financial incentive tied to meeting them (Goldhaber & Walch, 2011; Proctor, Walters, Reichardt, Goldhaber, & Walch, 2011). Studies in Tennessee (Tennessee Department of Education, 2012) and Austin, Texas (Terry, 2008), found that about two-thirds of teachers met all of their SLO targets (involving one goal in Tennessee and two in Austin). In Charlotte-Mecklenburg attainment rates ranged from 55␣percent to 81␣percent over three years (Community Training and Assis- tance Center, 2013). While in Charlotte-Mecklenburg teachers’ success rates rose in the second year of implementation and fell in the third year (Community Training and Assis- tance Center, 2013), success rates in Denver gradually increased over 2007–10 (Goldhaber & Walch, 2011).

These results suggest that SLOs may better discriminate among teachers than do traditional evaluation metrics (in which nearly all teachers are deemed “satisfactory”). Still, more than half the teachers met their targets in all the locations studied. And considering that the two Community Training and Assistance Center studies (2004, 2013) found that success rates increase the longer teachers participate in an SLO program, districts could find a large majority of teachers rated in the highest category several years after SLO implementation begins.

Whether the ability of SLOs to distinguish among teachers represents true differences in teacher performance or random statistical noise remains to be determined. No studies have attempted to measure the reliability of SLO ratings. [emphasis mine]

I'd like to thank the distinguished arts educator Laura H. Chapman for pointing this report from Mathematica out to me. Understand that the report itself is, in my opinion, seriously flawed (in more ways than one*). Like some of Mathematica's other work, they do a decent job of presenting evidence, but then they take huge leaps to reach completely unwarranted conclusions:

Many districts and states cannot wait until the research base fills out before implementing some kind of alternative growth measure, particularly for teachers in grades and subjects not covered in state assessments. The evidence on the application of value-added methods to alternative student tests is encouraging, suggesting that states and districts might want to begin by seeking out commercially available assessments that align with their curricula and standards or by seeking (or even developing) end-of-course assessments that are centrally scored and align with curricula and standards. The same value-added statistical methods used on state assessments can be used on other kinds of systematically scored student tests.

I'm sorry, but there's just no reason to make the case that districts and states "cannot wait." Of course they can wait, and they should wait; otherwise, they could set themselves up for lawsuits that will have profound impacts on schools and taxpayers.

What's remarkable here, however, is that in spite of their shoot-first-ask-questions-later policy predilections, Mathematica has confirmed what I found when I looked at the NJDOE's own evidence on SGOs:

There is no research base to support the widespread use of Student Growth Objectives. New Jersey school districts are spending unmeasured amounts of time and money on the implementation of SGOs, but there is no evidence they will improve teacher effectiveness, student achievement, or educator evaluations.

Hopefully, tomorrow, someone will ask Chris Cerf about this. I'll bet he's polishing up his tap shoes just in case.

Tea for two, and two for tea...

* The conclusions in the alternative assessment section, for example, are simply absurd:

Models based on widely used, commercially available assessments generally produce measures of teacher performance that correlate positively with other performance measures, such as teacher observations and student surveys. All the reviewed studies found positive relationships, with correlations up to 0.5.

First of all, showing a correlation between a couple of equally crappy teacher effectiveness assessments is not really a great test of validity. Second, that "up to" phrase really puts a crimp on the findings; it would be just as accurate to say "correlations as low as 0.15." (p. 7) Third, it wouldn't matter so much if these measures weren't being used in top-down, mandatory, high-stakes ways. But they are, which is really the crux of the problem. Use this data to inform decisions? Sure, that's fine. But force decisions based on this data? That's insane.

But what really bothers me is that correlations of only 0.5 at best show just how unreliable these measures are; that's the conclusion Mathematica should take away from this report. And yet, on page 14, the authors state:

Assessing teachers based on student growth on these assessments typically yields results that distinguish teachers from each other, that are comparably reliable to value-added estimates based on state assessments, and that correlate positively both with value-added on state assessments and with other performance measures, such as classroom observations. Districts and states looking for additional student growth measures could consider applying value-added statistical methods to commercially available assessments that align with their curriculum and standards.

Yeah, they could, just like I could jump out of an airplane without a parachute. Doesn't mean it's the smart thing to do.

3 comments:

George said...: SGOs are just as effective as APAs are for students and teachers of students with severe disabilities. Effective at wasting time. Teachers time and students time. Linking these assessments to funding for schools and job security of teachers totally negates the assessments.

In the first place APA is based on curriculum that is taught to students of the same age not ability level. For example a teacher is expected to give a 16 year old student on a 10th or 11th grade curriculum based APA. Even if the student can't talk, can barely move or communicate (if at all). If the test were given correctly the state would get a test paper that might have some drool on it and that student would fail. Because administration requires that these kids pass they miraculously do pass the test! How do they do it? The teacher does it for them because they are under pressure from administration and are afraid if the students fail they will lose there jobs (and they should be afraid). The other problem is that if a student passes an APA the bar is now set and must be exceeded the next time.

I'm sure the same thing will happen with SGOs. The SGOs also encourage teachers to set low standards for their students. All students will need to pass so the goal must be based on the lowest student's abilities. Most like something they already or almost have mastered.

I'm not sure how the higher ups in the state education system can be oblivious to what is going on. I think they must know and are acting stupid. As long as the scores are good they will get funding and get to keep THEIR jobs and that seems like all they care about. I'm not sure I can say I really blame them... I do what I had to to keep my job too. Sometimes it's hard to look in the mirror though.; November 9, 2013 at 10:53:00 AM PST
technokat said...: I heard that Cerf did not take questions. Not surprised.; November 12, 2013 at 5:27:00 PM PST
Duke said...: T, he did take questions at NJEA, and he stayed and talked with some teachers afterward.

Credit where it's due. Thx for the post.; November 12, 2013 at 7:36:00 PM PST