Let's get back to Jonah Rockoff's testimony at the NJ State Board of Education.
Rockoff's most famous piece of research is a paper he coauthored with Raj Chetty and John Friedman that was released last year: "THE LONG-TERM IMPACTS OF TEACHERS: TEACHER VALUE-ADDED AND STUDENT OUTCOMES IN ADULTHOOD" It is admittedly a fascinating piece of work, and I certainly wouldn't argue with its conclusions: teaching quality does matter and yields small but real and (somewhat) lasting effects in a student's life.
No, the problem with Chetty, Friedman, Rockoff isn't the paper itself; it's how the paper's conclusions were abused. As both Matt DiCarlo and Bruce Baker pointed out at the time - and as the paper itself cautioned - there was no reason, based on this study, for policy makers to start firing teachers based on their value-added scores. The study simply left too many questions open, chief among which was the remaining problem of identifying individual teachers' "value" with enough precision. And, as Diane Ravitch points out, the benefits of the study's conclusions have been way oversold.
But that didn't stop pundits and politicians from jumping on the test-based teacher evaluation express. Nick Kristof at the NY Times couldn't wait to start the culling; neither could Tom Moran at the Star-Ledger, who has still refused to explain this part of the paper to me:
Get back to me when you can, Tom...
Newark's State Superintendent, Cami Anderson, has used the study to justify her policy of school closings. Michelle Rhee, naturally, considered the research an endorsement of her own agenda. Republican governors rushed to embrace the paper to justify their reformy policies. Even President Obama got in on the act. All were willing to reach unwarranted conclusions based on this paper - conclusions that, again, the researchers themselves cautioned against.
But here's the funny thing: it appeared that Chetty, Friedman, and Rockoff weren't very much concerned that their research was, in effect, being abused. If anything, they actually seemed to embrace this misuse:
Policymakers (and newspapers) want research with immediate and obvious policy implications. They want the silver bullet. They want the breakthrough that negates all previous understanding – that tells us why everything we’ve done to date is wrong and paints a clear path forward. Unfortunately, too many researchers feel compelled to play along.
I bring all this up to make a point about Dr. Rockoff's testimony last week before the NJBOE that we must understand before we continue: while Rockoff is a respected and prolific researcher, he has an unfortunate habit of either not understanding or not caring about the effects of his research on real world policy making.Consider the great Chetty, Friedman and Rockoff one great teacher can earn a classroom of kids and extra quarter million dollars study, from this past winter. Many policymakers leaped to use that study as an immediate call to use value-added data for teacher de-selection policies. That call was endorsed by one of the authors own media quotes in which he asserted that we should fire sooner than later! (and that assertion was built on an overly bold if not absurd extrapolation of the earnings effect based on the single age at which the earnings effect was largest). [emphasis mine]
What Jonah Rockoff says in a forum like this matters. Policy makers are going to look at his expertise, his resume, and his accomplishments, and consequently give his opinions great weight. He has an obligation to be crystal clear in his assertions; he must consider his words carefully, because those words are going to steer policy in one direction or the other.
Which is why this passage, close to the beginning of his presentation, was so very disappointing [my transcription, slightly edited for clarity; all emphases mine]:
(2:30) The New Jersey measures of SGP [Student Growth Percentiles] are using prior test scores. There are other places that have also added in demographics, and I'll get back to that point in a bit. But I think that's the key thing here [inaudible] that distinguishes New Jersey from, let's say, New York State, which is slightly different.
What's the basis of growth analysis? I'm sure you've heard this before so I'm going to be very quick and go through it. But the idea is that we'll compare performance of a child on a state test relative to some benchmark for how much they should have grown. We look at a student... we can see in the Excel spreadsheet how much they grew on the test from last year to this year, but the big question is: was that good enough? How do I know how many points they were supposed to grow? So we compare that to some predicted benchmark for that student.
Once we have the prediction or the benchmark, then it's just arithmetic, right? We want to know, did the student beat the goal? Did they do better than we expected them to do? Then we predicted they would have done, the benchmark or goal we set for that student, and that teacher? What we do is we take the actual growth and we subtract out the benchmark, and that leaves us with a positive number or a negative number. Positive means you've exceeded expectations, negative means you fell below expectations.
In SGP, we express that in percentile terms. We don't have to, but we can; that gets us away from the scaling. And then the third is we take an average or a median. In practice, I've looked at this, it doesn't really matter if we take an average or a median, as long as the teacher has at least 10 or 15 kids, there's no big deal difference between the average and the median. But people for who knows what reasons, talk about the median or the average. But you get a measure that says, for this teacher, on average - I like to say "on average," it's hard to say "on median" - their students outperformed or underperformed expectations.
The key is setting the benchmark. We can all do the arithmetic, right? The key is setting the benchmark; that's the hard part in all of this. That's the hard part. What is a fair goal or standard for each child? What's the fair goal; what's the fair standard? This is a very deep and difficult problem. There is no one right answer that all researchers point to and say: "Aha - this is the magic formula for setting the benchmark for each kid."
But what researchers have done over the last ten years, I'd say, is when a lot of this literature comes about, is we've come up with a lot of work that says: "These are reasonable things one does." And the variance between these little... these differences between several reasonable choices are very small, relative to the difference between these reasonable choices and something that is completely unreasonable like No Child Left Behind, which says that every student gets the same goal: proficiency.I'm sorry, but that's just wrong; it is a mischaracterization of how AchieveNJ treats test scores. Dr. Rockoff needs to be corrected for the record.
NJDOE, to their credit, has published several resources on the proposed use of SGPs in teacher evaluations, including Damian Betebenner's technical overview of SGPs. And in every piece of media published that I have reviewed, there is never a mention of setting a "predicted benchmark" for any student as part of the proposed median SGP (mSGP) score that goes into a teacher's evaluation.
To be fair: in Betebenner's overview, he proposes a system of tracking a student over time to see if they are on their way toward a particular benchmark over a set number of years. In this system, student growth toward a set goal is described in percentiles; that's fine, even if the growth itself is described in relative terms. But this is not what NJDOE has described as part of the teacher, principal, or school evaluation models. As this video shows, and as NJDOE's presentations make clear, a teacher's mSGP is calculated solely on the basis of how his or her students do relative to their academic peers, and not by whether the student is on a trajectory toward a particular goal.
As Betebenner himself has made clear: SGPs are descriptive measures, relative to other students. They do not attempt to tell us how much a student grew in absolute terms (let alone why); they simply tell us how a student's growth compares to his or her "peers." Let me go back to an earlier post on SGPs and pull up a few illustrations:
Here's a distribution with one student in the 25th percentile and another in the 75th. The number above the bell curve is their change in NJASK scores over a year. Now let's imagine a much harder test, where all students didn't do as well:
The SGPs remained the same, because SGPs only measure student performance relative to other students. Benchmarks are not relevant to a student's SGP score. Here's another distribution where every student in the peer group showed at least some growth:
Again, the SGPs don't change, because SGPs are relative. In fact, the distribution of growth could be uneven, as it is above, but the SGPs still won't change. All that matters in a student's SGP - and a teachers mSGP - is where they place relative to other students.
So we are not measuring the students against a benchmark - unless we're willing to say that the other students themselves are the benchmark. But, if that's true, someone has got to lose. Someone has to be at the bottom, even if everyone has shown growth. Not all of the children can be above average.
Does Dr. Rockoff know this? Either he does or he doesn't. If he doesn't, why is he speaking so authoritatively on the subject? If he does, why is he being so careless with his words? Because this stuff matters in a very real and practical way.
If students are being judged by their growth toward a benchmark, then AchieveNJ is a system where all teachers have the chance to succeed. Everyone can attempt to reach the goal of getting their students to show a certain amount of growth on the NJASK, and everyone has at least the theoretical possibility of attaining that goal (whether the goal is practically possible, let alone desirable, is a question for another day).
If, however, teachers are judged in a normative way, someone has to lose. Someone - a lot of someones, actually - has to be below average. As a working teacher, I will tell you that there is all the difference in the world between these two scenarios.
Rockoff's testimony could easily be mistaken for suggesting that if a teacher helps her students meet a particular goal, she will be judged fairly. Imagine that teacher's surprise when she finds she has a low mSGP, even when her students have all shown growth, and all are proficient. Imagine the outcry to the NJBOE, who listened to Rockoff's testimony and may approve AchieveNJ's implementation. Imagine the uproar at NJDOE when they are deluged with calls demanding to know why teachers who are doing their jobs are being given "Partially Proficient" ratings?
Will Jonah Rockoff, Ph.D., be around to help deal with the mess?
More in a bit...