Tuesday, September 20, 2011

"But It's Just a PILOT!"

More on the "pilot" program to evaluate teachers in New Jersey:
Q:  How much weight do standardized test scores get in the evaluations?
A:  Standardized test scores are not available for every subject or grade. For those that exist (Math and English Language Arts teachers of grades 4-8), Student Growth Percentages (SGPs), which require pre- and post-assessments, will be used. The SGPs should account for 35%-45% of evaluations.  The NJDOE will work with pilot districts to determine how student achievement will be measured in non-tested subjects and grades.
Hey, are those SGPs actually any good at measuring teachers? We could ask the people who designed them... but leave it to Bruce Baker to point out these SGP cheerleaders don't like to think much about how their concoctions could be misused:
 The authors of the response make one more point, that I find objectionable (because it’s a cop out!):
To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.
What the authors accomplish with this point, is permitting policymakers to still assume (pointing to this quote as their basis) that they can actually use this kind of information, for example, for a fixed 90% share of high stakes decision making, regarding school or teacher performance, and  certainly that a fixed 40% or 50% weight would be reasonable. Just not 100%. Sure, they didn’t mean that. But it’s an easy stretch for a policymaker.
Lo and behold, look at the NJ pilot: "Hey, it's only 45%! No biggie! It's not the sole determinate, so why're you complaining?"

If the measures aren’t meant to isolate system, school or teacher effectiveness, or if they were meant to but simply can’t, they should NOT be used for any fixed, defined, inflexible share of any high stakes decision making.  In fact, even better, more useful measures shouldn’t be used so rigidly.
[Also, as I've pointed out in the past, when a rigid indicator is included as a large share (even 40% or more) in a system of otherwise subjective judgments, the rigid indicator might constitute 40% of the weight but drive 100% of the decision.]
Simply handing off the tool to the end user and then walking away in the face of misuse and abuse would be irresponsible.
These guys remind me of WWII German rocket scientists...

