I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Sunday, November 11, 2012

The Deeply Flawed TNTP Report on DC Schools

UPDATE: Matt DiCarlo has, as is usual, an excellent analysis. I especially like what he says about the survey TNTP conducted:
Unless I’m missing something, TNTP does not present any basic characteristics of their survey sample, to say nothing of trying to compare them with those of the larger DCPS group (the same was true of the districts in the “Irreplaceables” report). This is always required (even with random samples), but it’s especially important given that TNTP is an advocacy organization, one which is very well-known in DCPS (for example, it was founded by former Chancellor Michelle Rhee), and places many teachers in the district. Attitudes toward them can be very contentious, and it’s certain that at least some people decided for or against completing the survey based in part on their views of TNTP and/or their policy stances.
As far as I'm concerned, the survey in this report is useless. But I still stand by the linking of "Evaluation System" and "Compensation" - read more below.

I was going to write a long takedown of TNTP's report on Washington D.C.'s schools, but Gary Rubinstein thankfully beat me to the punch (emphasis mine):

Before getting into the actual paper, note that TNTP used to be run by Rhee and now they are writing research papers that justify the decisions Rhee made and suggest that they ramp up these efforts.  If the only way to get something positive to say is to write it yourself, essentially, then the results do need to be eyed critically.

The main conclusion that D.C. is doing something good because they are retaining their ‘good’ teachers and losing their ‘bad’ is based on their controversial IMPACT evaluation model.  This is the one that was, for some teachers, based 50% on value-added.  When a teacher gets a low evaluation, he or she can get fired so of course when they fire people with low ratings, their retention rate for those people will drop.  That’s just common sense.  Whether or not those ‘bad’ teachers were really ‘bad’ is another story.  If a good teacher is rated ‘bad,’ he may quit even if he isn’t fired since he will be frustrated by the system that rated him inaccurately.  This will make the retention rate for low performers go down even more.

This statistic gets even less relevant when we consider the potential bias, which the paper admits several times, in the rating system.  According to the paper, only 11% of teachers in high poverty schools were ‘high performing’ compared to 42% of teachers in low poverty schools.  On the flip side, only 3% of teachers in low poverty schools were ‘low performing’ compared with 36% of teachers in high poverty schools.  On page 2, they speculate this could reveal a flaw in the IMPACT model on which this entire study is based:

Irreplaceables appear less likely to teach in the schools that need them most.
In DCPS, highly rated teachers are much less likely to teach in schools with
high concentrations of poverty than in other schools, and that disparity is greater than what we found in other districts.3 We believe there are two possible explanations: either the district’s best teachers are simply distributed unequally, or a flaw in the design or implementation of the IMPACT evaluation system is making it easier for teachers in low-need schools to earn high ratings. More analysis is necessary to find and address the underlying problem, and DCPS should work quickly to do both.
The first explanation in that last paragraph is contradicted by one of the points reformyists never want to discuss: an "effective" teacher isn't always "effective" with all students. Some teachers may be great with gifted kids; some may be great with special needs kids. Some may be terrific with Hispanic children because they understand the culture and language; some may only shine with affluent kids in the suburbs. It's basically a manifestation of the psychological concept of "goodness of fit": we perform better or worse depending on whether our environment suits us.

Reformyists tell us, however, that if we measure student "growth," and not absolute performance, we remove any bias in measuring a teacher's effectiveness caused by assigning that teacher low-performing students. Leave aside that there's no proof that this is the case, it still doesn't address the issue that the "high-performing" teachers in D.C.'s schools may not be as great if they were teaching a different set of students.

And, if that's true, what would lead us to believe they really are "irreplaceable"?

A few more things: look at this chart from page 13 (click to enlarge):

The chart purports to show that D.C.'s compensation system, which features merit pay bonuses, isn't a big factor in why teachers leave. Leave aside the dubious practice of using rankings to make that case (let's also leave aside the really awful visual design to present the data). Notice the rankings for "Evaluation System"; D.C.'s "high performers" cite it more than any other district as a reason to leave. Well, if evaluation is tied to compensation, isn't one just a proxy for another? Isn't ranking "evaluations" high as a reason to leave pretty much the same as ranking "compensation" highly when the two are linked? 

And shouldn't we be worried that the evaluation system ranks so high as a reason to leave teaching in D.C.?

Another concern: it's been well documented here and elsewhere how faulty teacher evaluation systems are when using Value Added Modeling. But the other big part of D.C.'s system is the use of teacher observations. And the way those observations are used is fatally flawed, because they imply a phony sense of precision.

To go over this quickly once again, here's an example from the DCPS website of how a teacher obtains her observation score in D.C.'s IMPACT system:

Notice how all of the scores from observations are only significant to one digit - 1, 2, 3, or 4 - but they are expressed in two digits with a ".0" added to each. This is absolutely phony: it implies a two-digit precision when none exists. There is only one significant figure in each observation; therefore, the overall averaged score must be only one significant figure as well.

This is an abuse of mathematics designed to meet a policy end, because IMPACT does not work unless it is able to make fine distinctions between teachers. But it doesn't, and that throws the entire validity of TNTP's study into question, because we're not even sure the "high performers" who were rated solely on their observations are really "high performers" at all.

Last thought: the Washington Post rushed in to embrace this study and tout Michelle Rhee's "reforms." But nowhere did they - or, for that matter, TNTP - show that these practices are improving student achievement. I thought that was what this was supposed to be all about, right?

Rubinstein points to the work of G.F. Brandenburg, the truth-telling prophet of D.C.'s schools:
I spent some tedious hours looking at publicly-available data on DC-CAS standardized test scores that all public school students in grades 3-8 and 10 must take every year, in particular looking at the ‘pass’ rates for various sub-groups like whites, blacks, hispanics, poor and non-poor kids (as measured by whether they are eligible for free or reduced-price lunch), and so on.
In every single case, when I compared the scores of the historically higher-achieving group with the scores of the lower-achieving group, I find that in DCPS, all of the schemes of Rhee, Kamras, Fenty, Henderson, Gray and a few powerful billionaires have been a complete and utter failure. Even on their own terms.
Click through and view the ugly truth. You can be sure the Washington Post didn't.

Washington Post Editorial Board

No comments: