I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Tuesday, March 19, 2013

NJ Teacher Evaluation: Math Fail #3

Let's go back to this slide from the NJDOE's presentation on their proposed new teacher evaluation system, AchieveNJ:

The yellow annotation is mine from a previous post on Student Growth Percentiles. Apparently, even though SGPs are on a scale from 1 to 99, the NJDOE thought it would be fine to give an example of a teacher's median SGP as "2.0," which makes no sense at all; read the post for all the details.

Alas, this is not the only problem here...

Let's talk a bit about the "Teacher Practice Evaluation Instrument." The hypothetical teacher here got a score of "3.0," which is already a problem. I know of no observation system that categorizes teachers into any more than four categories. The Danielson framework, for example - which NJDOE says in is use, in one form or another, in over half of NJ's schools - classifies teachers into one of four categories: "Unsatisfactory," "Basic," "Proficient," and "Distinguished." But here's the important thing:

These categories are what are called ordinal measures. They are named this because they order the things being measured: a teacher, for example, is "ordered" in one of the four Danielson categories, ranked from lowest to highest. What ordinal measures do not do, by definition, is say how much "worse" or "better" one category is from the other. 

But look at the score under "Teacher Practice Evaluation Instrument": "3.0" Not "3," but "3.0," implying that there are steps of "effectiveness" in between "3" and "4." Sorry, but no: these are ordinal measures and, as such, there is no wiggle room between them.

Almost a year ago, I wrote about this in regards to the IMPACT teacher evaluation system used in Washington, D.C. Look at this example provided by DCPS:

The implication here is that ordinal measures can be averaged. But that's a huge no-no: you aren't allowed to average ordinal measures, because the average would imply that there are steps in between the ordinal categories when, by definition, there are not.

Does NJDOE intend to introduce this phony precision into AchieveNJ? Well, the Educator Effectiveness Task Force - the genesis for this entire thing - approvingly cites IMPACT multiple times in their report. And that "3.0" in the NJDOE's example implies they would approve of a "3.1" if given, wrong as that would be. So there is every reason to believe NJDOE would happily accept phony precision in teacher evaluations.

Worse, look at the final score: "2.725." The ordinal scale for the TPEI has been further corrupted by "weighting" it at 50% against other measures that have different, more variable scales. Notice, for example, that the Student Growth Objective is "3.5" (is there is "3.4"? NJDOE ain't sayin'!). That implies it's more "variable" than the TPEI; consequently, its variability makes it the deciding factor in a "tie" between two teachers who, in other areas, are judged as "equal" simply because those areas aren't as "variable."

Yeah, I know - this is really knotty. This is, by far, the most difficult thing I've tried to explain on this blog. But it's really, really important. If you have no idea what I'm saying, try this post. And then say this with me:

Some of the evaluation; all of the decision.

"Oh, Jazzman, stop splitting hairs! So what if NJDOE allows a some mathematical rule-bending! It's not like this actually matters!"

That, my friend, is where you are dead wrong; it matters a great deal when the decision making process does not allow principals to do their jobs. More on this next.

Ordinal measure? What's that?

No comments: