Jersey Jazzman: April 2017

Saturday, April 29, 2017

Desperately Searching For the Merit Pay Fairy

It's been a while since we've talked about the Merit Pay Fairy.

Yo, it's me -- da Merit Pay Fairy, makin' all your reformy dreams come true!

The Merit Pay Fairy lives in the dreams and desires of a great many reform-types, who desperately want to believe that "performance incentives" for teachers will somehow magically improve efforts and, consequently, results in America's classrooms. Because, as we all know, too many teachers are just phoning it in -- which explains why a system of schooling that ranks and orders students continually fails to make all kids perform above average...

One of the arguments you'll hear from believers in the Merit Pay Fairy is that teaching needs to be made more like other jobs in the "real world." But pay tied directly to performance measures is actually quite rare in the private sector (p. 6). It's even more rare in professions where you are judged by the performance of others -- in this case, students, whose test scores vary widely based on factors having nothing to do with their teachers.

But that doesn't matter if you believe in the Merit Pay Fairy; all that counts is that some quick, cheap fix be brought in to show that we're doing all we can to improve public education without actually spending more money. And, yes, merit pay as conceived by many (if not most) in the "reform" world, is cheap -- because it involves not raising the overall compensation of the teaching corps, but taking money away from some teachers and giving it to others, using a noisy evaluation system incapable of making fine distinctions in teacher effectiveness.

Which brings us to the latest merit pay study, which has been getting a lot of press:

Student test scores have a modest but statistically significant improvement when an incentive pay plan is in place for their teachers, say researchers who analyzed findings from 44 primary studies between 1997 and 2016.

“Approximately 74 percent of the effect sizes recorded in our review were positive. The influence was relatively similar across the two subject areas, mathematics and English language arts,” said Matthew Springer, assistant professor of public policy and education at Vanderbilt’s Peabody College of Education and Human Development.

The academic increase is roughly equivalent to adding three weeks of learning to the school year, based on studies conducted in U.S. schools, and four weeks based on studies across the globe.

Let's start with the last paragraph first: the notion that you can translate this study's effects into "weeks of learning" is completely without... well, merit. Like so much other research in this field, the authors make the translation based on a paper by Hill et al. (2008). I'll save getting into the weeds for later (and in a more formal setting than this blog), but for now:

Hill et al. make their translation of effect sizes into a time periods based on what are called vertically-scaled tests. These are tests that let at least some students attempt to answer at least some common items between concurrent grade levels, allowing for a limited comparison between grades (see p.17 here).

There is no indication, however, that any of the tests used in any of the 44 studies are vertically scaled -- which makes a conversion into "x weeks of learning" an unvalidated use of test scores. In other words: the authors in no way show that their study can use the methods of Hill et al., because the tests are likely scaled differently.

Furthermore: do we have any idea if the tests used in international contexts are at all educationally equivalent to the tests here in the US? For that matter, what are the contexts for the teaching profession, and how it might be affected by merit pay, in other countries? So far as I'm concerned, the effect size we care about is the one found in studies conducted in this country.

That US effect size is reported in Table 3 (p. 44) as 0.035 standard deviations. How can we interpret this? Plugging into a standard deviation-to-percentiles calculator (here's one), we find this effect moves students at the 50th percentile to 51.4.* It's a very tough haul to argue that this is an educationally meaningful effect.

Which brings us to the next limitation of this meta-analysis: the treatment is not well defined. To their credit, the authors attempt to divide up the different studies by their characteristics, but they only do so in the international aggregate. In other words: they report the differences between a merit pay plan that uses group incentives versus a "rank order tournament" (p. 45, Table 4), but they don't divide these studies up between the US and the rest of the world.

Interestingly, group incentives have a greater effect than individual competitions. But there is obviously huge variation within this category in how a merit pay plan will be implemented. For example: where did the funds for merit pay come from?

In Newark, merit pay was implemented using funds dedicated by Mark Zuckerberg. Teachers were promised that up to $20 million would be available; of course, it turned out to be far less (and it's worth noting that there's scant little evidence Newark's outcomes have improved). Would this program have different effects if the money had not come from an outside source?** What if the money came, instead, from other teachers' salaries (which may, in fact, be the case in Newark)?

Any large-scale merit pay plan will be subject to all sorts of variations that may (or may not) impact how teachers do their jobs. Look at the descriptions in Table 6 (p. 47), which recounts how various merit pay plans affect teacher recruitment and retention, to see just how diverse these schemes are.

I think it's safe to say that "merit pay" in the current conversation is not really about giving bonuses for working in hard-to-staff assignments, or for taking on extra responsibilities, or even for working in a group that meets a particular goal. I'm not suggesting we shouldn't be looking at the effects of programs like this, but I don't think it's helpful to put them into the same category as "merit pay."

I think, instead, that "merit pay" is commonly understood as being a system of compensation that differs from how we currently pay teachers: one where pay raises are based on individual performance instead of experience or credentials. The Chalkbeat article certainly implies this by making this comparison:

Teacher pay is significant because salaries account for nearly 60 percent of school expenses nationwide, and research is clear that teachers matter more to student achievement than any other aspect of schooling (although out-of-school factors matter more). About 95 percent of public school districts set teacher pay based on years of experience and highest degree earned, but merit pay advocates argue that the approach needs to change. [emphasis mine]

Take a look at a sample of articles on teacher merit pay -- here, here, here, here, and here for example -- and you'll see merit pay contrasted with step guides that increase pay for more years of experience or higher degrees. You'll also notice none of the proponents of merit pay are suggesting that the overall amount spent on our teaching corps should increase.

I can understand the point of writers like Matt Barnum who argue that merit pay can come in all sorts of flavors. But I contend we're not talking about things like hard-to-staff bonuses or group incentives: When America debates merit pay, it's really discussing whether we should take pay from some teachers and give it to others.

Unfortunately, by analyzing all of these different types of studies together, the Vanderbilt meta-analysis isn't answering the central question: should we ditch step guides and move to a performance based system? That said, the study may still be giving us a clue: the payoff will likely be, at best, a meager increase in test scores.

Of course, we have to weigh that against the cost -- or, more precisely, the risk. Radically changing how teachers are paid would create huge upheavals throughout the profession. Would teachers who were in their current assignments stay on their guides, or would they potentially take huge hits in pay? If they were grandfathered out of a merit pay scheme, how would they work with new teachers who were being compensated differently?

Would merit pay be doled out on the basis of test scores? How much would VAMs or SGPs be weighted? How would teachers of non-tested subjects be eligible? Would the recipients of merit pay be publicly announced? In New Jersey and many other states, teacher salaries are public information. Would that continue? And how, then, would students be assigned to the teachers who receive merit pay? Will parents get to appeal if their child is assigned to a "merit-less" teacher?

The chaos that would result from implementing an actual merit pay plan is a very high cost for a potential 0.035 standard deviation improvement in test scores.

I know believers in the Merit Pay Fairy would like to think otherwise, but clapping harder just isn't going to make these very real issues go away.

Don't listen to dat Jazzman guy! Just clap harder, ya bums!

ADDING: More from Peter Greene:

Researchers' fondness for describing learning in units of years, weeks, or days is great example of how far removed this stuff is from the actual experience of actual live humans in actual classrooms, where learning is not a featureless tofu-like slab from which we slice an equal, qualitatively-identical serving every day. In short, measuring "learning" in days, weeks, or months is absurd. As absurd as applying the same measure to researchers and claiming, for instance, that I can see that Springer's paper represents three more weeks of research than less-accomplished research papers.

Heh.

* Some folks don't much care for making this kind of conversion. In my view, it's much more defensible than converting to "x weeks of learning," which, even setting aside the problems of converting from vertically scaled tests, suffers from unjustified precision. In addition, the implications behind the translation are subject to wild misinterpretation.

Converting to percentiles might a bit problematic. But it's not nearly as bad as using "x weeks of learning."

** We'll never know because no one has bothered to find out if the Newark merit pay program actually worked. Think about it: $100 million in Facebook money, and no one ever considered that maybe reserving a few thousand for a program evaluation was a good idea.

If I was cynical, I might even think folks didn't want to study the results, because they were afraid of what they might find. Good thing I'm not cynical...

Monday, April 10, 2017

Teacher Tenure and Seniority Lawsuits: A Failure of Logic

New Jersey's teacher tenure and seniority lawsuit continues to grind away. Part of a trio of suits here and in New York and Minnesota, these lawsuits are all being brought to the various state courts by the Partnership for Educational Justice, Campbell Brown's secretly funded organization.

Their Minnesota lawsuit was thrown out of court last fall; in New Jersey, however, we had to wait for a state Supreme Court ruling on a Christie administration motion to tie tenure and seniority laws to school funding. The Court ruled it wasn't going to opine on these laws until a lower court takes up the PEJ's case. So now we wait for that ruling -- and the PEJ continues its public relations campaign against tenure and last in-first out (LIFO) seniority rules.

To their credit, PEJ has posted all of the filings in the case. But it's clear after reviewing them that PEJ doesn't have a leg to stand on. Not to say they won't prevail: bad legal reasoning didn't stop Judge Rolf Treu in California from issuing a terrible ruling in Vergara, which was inevitably overturned on appeal. Similarly, the only way PEJ can win here in New Jersey is if the lower court hearing the case sets aside all logic and reason...

Because the PEJ's case simply makes no sense.

When a group like the PEJ goes before the courts to get a statute overturned as unconstitutional -- by which I mean in violation of the state's constitution, not the federal Constitution -- the burden of proof is on them. They may have a problem with the NJ tenure and LIFO statutes, or any other law on the books, but getting the court to overturn a law isn't simply a case of arguing against the law's merits: they have to show how it violates the state's constitution.

The constitution states (Article VIII, Section IV): "The Legislature shall provide for the maintenance and support of a thorough and efficient system of free public schools for the instruction of all the children in the State between the ages of five and eighteen years." Unless and until the PEJ can demonstrate to the courts that tenure and LIFO laws violate this clause, the courts cannot act.

In a long-running series of cases involving school funding in New Jersey, the NJ Supreme Court found the systemically inadequate and inequitable funding of schools was in direct violation of the education clause. Although the litigation has a long and complex history, the basic premise of the lawsuits is comparatively simple: at-risk children need more funding to equalize educational opportunities, the state's system of school funding makes it impossible for those children's communities to raise adequate funds on their own and, therefore, the state needs to intervene.

In contrast, the challenge for the PEJ is to show how tenure and LIFO laws similarly violate the education clause; even the PEJ's own filing concedes this point. The problem is that right after stating what their legal argument should be, they completely ignore the task.

Yes, they make the case districts with larger proportions of at-risk children show fewer gains in academic outcomes; no one disputes this. Yes, they make the case teachers matter; no one disputes this (although the canard of teachers being "the most important in-school factor" for student achievement is wrong: the student is the most important "in-school factor," not the teacher). Yes, they make the case ineffective teachers should be dismissed; no one disputes this.

They even go further and argue that the quality of teachers suffers in districts that serve many at-risk students. Certainly, there's strong evidence students in these districts are more likely to have less qualified teachers, as judged by their credentials, experience, or scores on knowledge tests (teacher scores, not students).

But none of this speaks to the central argument PEJ is trying to make:

Hill: The Newark Teachers Union says — about the comment like that — these folks who are tenured, they’ve been through a certain process and if the process determines that they’re no longer an effective teacher, the process has a way of dealing with them. You say?

[Ralia] Polechronis [PEJ Executive Director]: So, that’s not entirely what we are talking about here. What we’re talking about in LIFO are terminations and layoffs that have to happen only during budget cuts. So that process, that dismissal process, isn’t really at play. We’re talking about a situation when the district is in such dire financial constraints and is having such a problem figuring out its budget that they have to go to teachers, they have to go to laying them off and they have to make that decision, according to the law, by the level of seniority instead of thinking about the great teachers that are in the classroom and that should stay there.

Think about what Polechronis is assuming: that a district like Newark has the ability to differentiate at a very fine level the effectiveness of individual teachers, and then act accordingly in high-stakes decisions.

Let's be very clear: There is no evidence -- none -- that teacher effectiveness can be measured reliably and validly at a level that allows for high-stakes decisions to be made regarding teachers who have already been found to meet a minimal level of effectiveness.

What PEJ argues implicitly is that the Newark Public Schools can simply use its observation rubrics and Student Growth Percentiles and Student Growth Objectives to calculate an overall measure of teacher effectiveness, and then apply that measure to fairly determine who gets the boot when budgets cuts "must" be made. But this contradicts everything we know about measuring teacher effectiveness.

Yes, principals can identify their very worst teachers; they are incapable, however, of differentiating the effectiveness of the vast bulk of teachers in the middle. The phony precision of observation protocols like the Danielson Model have led some to think we can validly use the resulting scores to accurately rank and order teachers; that is a mistaken belief grounded in innumeracy. In the same way, the error that is an inherent part of standardized tests makes the use of SGPs in decisions like this invalid (among many other reasons). And SGOs are, to be blunt, a joke.

The plain truth is that even if PEJ got its way and teachers could be dismissed without regard to seniority, there is no reliable and valid way to evaluate the majority of teachers who are dismissed in reductions-in-force. Yes, we can identify the worst performers; we can and should either get them remediation or remove them from their classrooms. But there's simply to reason to believe Newark, or any district, can accurately rank all teachers by their effectiveness.

But that's not the only failure of logic in PEJ's case. Because even if districts could make accurate decisions based on effectiveness -- again, they can't, but play along -- they would still have to show that districts like Newark were disproportionately affected by LIFO laws.

Unlike school funding -- which, despite all of the lawsuits, is still inequitably distributed across the state -- tenure and LIFO laws apply to every district equally. Newark and more affluent Millburn both have to operate under tenure laws; Camden and more affluent Haddonfield both have LIFO. Yes, the cities have had to make cuts in staff, in large part because charter schools, imposed by the state, have gobbled up more students and more resources. But that's not a function of tenure or LIFO laws; how could it be?

Reading through the PEJ's filings, it's clear they are unable to make a case that urban students have suffered disproportionately by tenure; in fact, as NJEA points out in one of its briefs, there isn't even evidence that any of the plaintiffs' children suffered from having a bad teacher who was spared dismissal by the LIFO laws, calling into question the plaintiffs' standing.

What is clear is that Newark's schools have suffered from inadequate and inequitable funding; even the plaintiffs acknowledge students have suffered from losses of staff like librarians and guidance counselors (p.9-10). But they put forward no argument that removing LIFO laws would have saved those jobs; again, how could they?

Some have argued that dismissing senior, higher-paid employees frees up more funds for lower-paid, less senior staff, thus leading to fewer reductions. This assumes that teacher effectiveness is evenly distributed across experience, which we know is not true -- when you cut experienced teachers, you're more likely to cut effective teachers (and again: we're setting aside the problem that you can't rank and order the vast majority of teachers by effectiveness anyway).

It also assumes that there is so much inefficiency within urban schools that they can cut staff and retain programming and class size. Empirically, however, we know that NJ's urban schools are not systemically its most inefficient ones. We also know that funding adequacy correlates with staff per student in various educational programs, which means the problems of cutting staff and programming have much more to do with inadequate funding than they do with tenure and LIFO -- policies, again, which are enforced in all districts.

Finally, it's important to remember that teachers value tenure and LIFO. If the state gets rid of it, that decreases the overall compensation, momentary and otherwise, of teachers. Are the taxpayers of New Jersey willing to fork over more money to make up for this loss in incentives? Or do they want to see a less qualified pool of prospective teachers enter the profession?

The backers of these lawsuits will make occasional concessions to the idea that schools need adequate and equitable funding to attract qualified people into teaching. But they never seem to be interested in underwriting lawsuits that would get districts like Newark the funds they need to improve both the compensation and the working conditions of teachers.

Instead, they waste their time with lawsuits like this -- suits that fail on legal, empirical, and logical grounds. Suits that do nothing to help deliver the resources all students need to equalize educational opportunities. Suits that do nothing to improve the effectiveness of New Jersey's teaching corps, or the efficiency of its school system. Suits that only serve to further dishearten the people who go to work in public schools every day on behalf of the taxpayers and students of this state.

Maybe one day Campbell Brown and the PEJ will stop trying to take away the hard-fought rights of teachers, and take up the real fight for our state's deserving children.

Jeff Parker

ADDING: As if on cue:

ATLANTIC CITY — The school district advertised three times for a certified chemistry teacher last summer and fall, and three times they failed to get a candidate to accept the job.

So they turned to Edmentum, a provider of online courses, to fill the gap. This year, four classes at the high school are being taught via the online course, with backup support from a teacher.

[...]

The statewide shortage makes the position competitive. At least three area school districts are looking for chemistry teachers next year.

Ralph Aiello, principal at Cumberland Regional High School, said he’s looking for a combined chemistry/physics teacher for next year. So far, he has had just two applications.

Linda Smith, president of the New Jersey Science Teachers Association, said she is working with colleges to develop programs that recruit former or retired scientists into teaching as a second career.

“People can just make more money as scientists than they can as science teachers,” she said. “Some do want to teach. But they need training and mentoring. People who are good at science are not always good at explaining it.” [emphasis mine]

Terry Moe, hardly a friend of teachers unions, states: "...most teachers see the security of tenure as being worth tens of thousands of dollars a year.” So please, PEJ: Explain to us how eliminating tenure and LIFO will help recruit better candidates into a profession that is already suffering from serious shortages.

(This should be good...)