The more I looked at this, the more absurd it seemed. If NJDOE is going convert a number from 1-99 into a two-digit number between 1.0 and 4.0, how will they do it? Will a "2.1" be allowed? If so, don't they understand that by making the SGP measure more variable than the other measures, they are giving it more weight in a high-stakes decision? Again: some of the evaluation, all of the decision.Slide 14 (annotations mine):
Slide 20:OK, wait a minute...Slide 14 says my students' Student Growth Percentiles, or SGPs, will be calculated on a scale of 1 to 99. We've already established that my evaluation will use the Median SGP for my class, even though that is potentially a hugely distorted metric (see NJDOE Math Fail #1 for more). But Slide 20 has an example where a teacher gets a "raw score" of 2.0 for their mSGP.How in the hell did the NJDOE calculate this number?
Well, it looks like someone at NJDOE is reading the Jazzman, because a new raft of
"Guidance is forthcoming..." Nice use of the passive voice; avoids having to take ownership of the decision...
So NJDOE has at least acknowledged what was always a central problem, spelled out here by Bruce Baker:
First, the standard evaluation model proposed in legislation requires that objective measures of student achievement growth necessarily be considered in a weighting system of parallel components. Student achievement growth measures are assigned, for example, a 40 or 50% weight alongside observation and other evaluation measures. Placing the measures alongside one another in a weighting scheme assumes all measures in the scheme to be of equal validity and reliability but of varied importance (utility) – varied weight. Each measure must be included, and must be assigned the prescribed weight – with no opportunity to question the validity of any measure.  Such a system also assumes that the various measures included in the system are each scaled such that they can vary to similar degrees. That is, that the observational evaluations will be scaled to produce similar variation to the student growth measures, and that the variance in both measures is equally valid – not compromised by random error or bias. In fact, however, it remains highly likely that some components of the teacher evaluation model will vary far more than others if by no other reasons than that some measures contain more random noise than others or that some of the variation is attributable to factors beyond the teachers’ control. Regardless of the assigned weights and regardless of the cause of the variation (true or false measure) the measure that varies more will carry more weight in the final classification of the teacher as effective or not. In a system that places differential weight, but assumes equal validity across measures, even if the student achievement growth component is only a minority share of the weight, it may easily become the primary tipping point in most high stakes personnel decisions. [emphasis mine]Basically, NJDOE reads this and says: "OK, Professor, we understand the SGPs are more variable than the observations, and that's going to be a problem. But we'll solve that by converting the SGPs into a measure that only varies as much as the other parts of the evaluation! See, problem solved!"
Except the problem isn't solved at all: it's made far, far worse, because the only way to make the conversion is to assign cut points in the SGPs!
Let's say, for example, that NJDOE comes up with a conversion table that looks like this:
|"Raw" mSGP||"Converted" mSGP|
What's that? Your mSGP is a "49"? Oh, so close, but too bad! Enjoy your next career...
It really doesn't matter where you put the cut points: a difference of one point in mSGP is enough to tip the measure. What's worse, the mSGP is the median of a class's SGP scores, not the average. Remember when I showed that two classes with very different average growth could still have the same median SGP?
Here are two classes with very different average SGPs, but the same median SGP (Ms. Jones's blue diamond is hidden by Ms. Smith's red box). But let's suppose that the student with the mSGP in Ms. Jones's class misses just one question on the NJASK. His SGP dips down ever so slightly, and so does Ms. Jones's mSGP. If that dip occurs right at the cut point, guess what?
You can barely see it, but that middle student in Ms. Jones's class is right below the cut point; her entire evaluation changed because of a tiny adjustment in one student's test scores! Yes, Ms. Jones, we know you have higher average "growth" in your class than Ms. Smith, but you still got a "2" on your evaluation, and Ms. Smith got a "3." And budget cuts mean we have to RIF someone; start working on your resume...
There is an easy way to solve all of this: don't force a principal to act on the data using a top-down system dictated by the NJDOE. More in a bit...