I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Friday, August 23, 2019

Clapping Harder For the Merit Pay Fairy


Earlier this week, I wrote about the death of the Merit Pay Fairy in Newark, New Jersey.

Hey, Jazzman, you bum -- I ain't dead yet!

Back in 2012, Newark began an experiment in teacher merit pay, fueled by funds from Facebook's Mark Zuckerberg. Teachers were promised up to $20 million over three years in extra incentive pay -- but in the first year, only $1.4 million was disbursed, and most of that appears to have comes from other teachers, who had their pay docked because they were deemed "ineffective."

Merit pay, in other words, was little more than a broken promise to the teachers of Newark right from the start. A survey of Newark teachers in the first year found a large majority did not see the compensation system as "reasonable, fair, and appropriate." (p. 24) It's not a surprise, therefore, that this past month both the teachers union in Newark, the NTU, and the district's administration decided that the program was not worth continuing. 

But some reformy folks believe in merit pay the same way some children believe in fairies: they don't want to acknowledge the evidence that shows, even in the most generous reading, that the benefits of merit pay are very small and likely are not indicative of true increases in student learning. Like Peter Pan, these true believers hope against hope that fairies can be brought back to life simply by clapping harder:
In 2012, Newark Public Schools did something remarkable. The district reached an agreement with the Newark Teachers Union that would fundamentally shift how teachers are not just evaluated but paid. 
Then-Gov. Chris Christie and Randi Weingarten, the President of the American Federation of Teachers, announced the groundbreaking deal together on national television. At the time, my organization, the National Council on Teacher Quality, called the contract “a model to which other districts should aspire.”
That's from Kate Walsh, president of NCTQ, an organization that has previously held up Newark Public Schools (NPS) as an exemplar of teacher evaluation, claiming the district was "getting results" from its system:
The district gave the evaluation system a chance to work. While Newark saw students’ achievement decline initially after the new evaluation system was implemented, the district persevered and student achievement rose to the level it had been before and, in English, exceeded previous levels. (p.12)
It's hard to be more vague than that -- how was student achievement measured? What was the improvement? Most importantly: how do we know if the teacher evaluation system was affecting results?

As Bruce Baker and I pointed out in our review of NPS "reforms," plenty of other districts with similar demographics were showing similar growth in student achievement, without things like merit pay. In addition, Newark has not seen the same demographic shifts over time many comparison districts have.

You must account for this stuff if you're going to make a causal claim about merit pay in Newark. Alas, Walsh still seems wholly uninterested in digging into these details; for example:
The district would also start to use their dollars in the same way that other employers do. Pay would become a strategic tool to attract the best teachers to where they were most needed. Teachers who were ineffective would no longer receive an annual raise. Teachers who were rated as highly effective earned a healthy $5,000 bonus. Even better, high performing teachers who were either able to teach subjects that were hard for the district to staff, particularly in the lowest performing schools, could earn even more, up to $12,500 a year.
In the first year of the contract, Newark had about 3,200 teachers. How many qualified for the highest bonus, $12,500? Only eleven. Is Walsh really trying to make the case this small disbursal made a significant difference in teacher quality in Newark? She continues:
The results spoke for themselves. After five years of implementation, 96 percent of highly effective teachers chose to stay in Newark and 49 percent of ineffective teachers were voluntarily leaving the district—exactly the sort of pattern schools need to see but rarely do. Accordingly, the district has higher student enrollment now than at any other time in recent history, suggesting parents gained a renewed confidence in the district.
First of all, we have no way of knowing whether these teacher attrition and retention rates are significantly better than they would be in the absence of the merit pay scheme. We don't know how they compare to similar districts' rates. We don't even know how they compare to rates before merit pay in Newark. Again, it's completely unwarranted to make any sort of causal claim without at least some attempt to compare these rates to a counterfactual.

Second, the notion that there's any evidence that shows student enrollment has increased because of "renewed confidence" due to merit pay is absurd on its face: "I was going to move my family, but now that Newark has teacher merit pay, we're staying!" Maybe the city's child population is simply increasing. Is Walsh so enamored with merit pay she's willing to make wild stretches like this?

Apparently, she is:
Where once compensation was used as a strategic deployment of resources to ensure the district can fill its vacancies, keep its best teachers, and ensure the most vulnerable students have access to them, soon there will be nothing but raises based on years of experience, and requiring teachers to spend precious time and money earning another degree they more than likely do not need. 
Research shows over and over again that advanced degrees do not make teachers more effective with the exception of math and science. 

First of all, there is a large body of evidence that shows teacher experience correlates with effectiveness. And while gains in effectiveness are strongest in the first few years, gains do persist up through the third decade of a teacher's career. Tying compensation to experience is hardly a policy without evidence to support it.

Next: "the exception of math and science" is a very big exception. Is Walsh prepared to offer a bonus only to math and science teachers with advanced degrees? And is she really sure French teachers don't benefit from degrees in French, or music teachers don't benefit from degrees in conducting, or that teachers in many other subjects don't benefit from gaining expertise through earning an advanced degree in that subject?

I've been looking at the research on advanced degrees and teaching for some time now, and the conclusion I've come to is that it is highly limited. Most studies don't account for alignment of teaching subject and degree concentration; in other words, the results are likely skewed because they don't separate getting a degree in what you teach from getting any degree.

These studies also usually don't account for variations in the quality of the degree-granting programs: crappy on-line programs are lumped in with rigorous degrees from research universities. In addition, the student outcomes are almost always measured by test scores, which limits the teachers studied to tested grades (3-8) and only two subjects (math and English).

Walsh's sweeping statement is simply not justified. Further, she ignores the reality that NPS must compete with other districts that offer masters pay bumps to attract qualified teaching candidates. Is the district suppose to ignore this reality? Especially because there is no evidence whatsoever that NPS has attracted better candidates to its teaching staff than other districts?

As I said in the last post: there was supposed to be an ongoing study of merit pay in NPS. But that study ended after a single year. We have no evidence whatsoever that Newark attracted better teaching candidates, improved student outcomes, or raised teacher effectiveness by using merit pay.

But this lack of evidence isn't stopping Walsh from clapping harder:
In 2012, Newark Public Schools took bold steps to create a compensation system that would help to attract and keep the best teachers. The district used resources strategically to ensure the most vulnerable students had access to the best teachers, an accomplishment that many districts struggle to achieve. With this new contract, instead of being a leader in strategic compensation, Newark becomes a district that takes a one-size-fits-all approach to its teachers, to the detriment of its students and teachers alike.
Even by current reformy standards, this statement is way over the top. We have no evidence the best teachers in Newark went to the neediest students. We have no evidence Newark was better at teacher allocation than districts that didn't implement merit pay. We have no evidence Newark is now a "one-size-fits-all" district. We have no evidence merit pay was a benefit to Newark's teachers and students.

What we do know is that the majority of Newark teachers didn't think the system was fair. That, by itself, is enough to declare Newark's merit pay experiment dead -- even if some folks keep clapping for it.

UPDATE: After I posted this blog, I went back and looked at the teacher survey again, which is part of a report commissioned by the American Institutes for Research. AIR is an excellent research organization, and they produce high-quality work. That said, there are a few oddities in their Newark report:
In response to a set of questions about their knowledge of the current evaluation process, 83 percent of teachers and 99 percent of school leaders reported that they have a clear understanding of the evaluation process. In addition, in response to a set of questions about the fairness of the evaluation process, 72 percent of teachers and 92 percent of school leaders reported that the evaluation process is fair, which is larger than the 30 percent reported fairness by teachers in an evaluation of 25 districts in New Jersey (Firestone, Nordin, Shcherbakov, Kirova, & Blitz, 2014) and the 39 percent reported fairness by teachers in 10 districts in Arizona (Ruffini, Makkonen, Tejwani, & Diaz, 2014). [p.20]
Let's set aside the Arizona survey, which I haven't yet read, and just focus on the New Jersey one. That report, which I'm well-acquainted with as it came out of Rutgers (where I got my PhD and current teach part-time), did not have a sample of representative districts. It was, instead, an evaluation of a pilot program of teacher evaluation conducted in 25 districts across the state. It had a low response rate (39 percent), but more important, it was conducting a survey after the state had imposed a new law on districts, TEACHNJ, forcing them to rework their evaluation systems.

I can tell you as a teacher who lived through that time: TEACHNJ was not popular with many working teachers in the state. So it shouldn't be surprising the popularity of the new system was so low. Unlike Newark in its first year, there wasn't a whole bunch of money promised from an outside source going to these districts.

My point here is that the comparison is, at best, strained. And AIR really should have spelled out more clearly the limits of the comparison of the two reports.

Here's another oddity from the AIR report:
Figure 2 shows that the retention rates among teachers rated “effective” and “highly effective” exceed 90 percent, whereas retention rates among teachers rated “partially effective” and “ineffective” are 72 percent and 63 percent, respectively. In contrast, the most recent results from the national 2012–13 Teacher Follow-Up Survey indicate that 84 percent of public school teachers are retained, on average (Goldring, Taie, & Riddles, 2014).
Golding et al. does find 84.3 percent of teachers stayed in their positions -- but that's all teachers, not "effective" ones. There's simply no way to know, based on this report, whether Newark did any better in retaining its better teachers thanks to merit pay.

Again, it's fine to include this data point, but a little more context is probably in order.

Tuesday, August 20, 2019

The Merit Pay Fairy Dies in Newark

One of the long-running characters on this blog is the Merit Pay Fairy.

Hey, youse bums -- get back to the teachin' already!

The Merit Pay Fairy lives in the dreams of right-wing think tanks and labor economists, who are absolutely convinced that our current teacher pay system -- based on seniority and educational attainment -- is keeping teachers from achieving their fullest potential. It matters little that even the most generous readings of the research find practically small effects* of switching to pay-for-performance systems, or that merit pay in other professions is quite rare (especially when it is based on the performance of others; teacher merit pay is, in many contexts, based on student, and not teacher, performance). 

Merit pay advocates also rarely acknowledge that adult developmental theory suggests that rewards later in life, such as higher pay, fulfill a need for older workers, or that messing with pay distributions has the potential to screw up the pool of potential teacher candidates, or that shifting pay from the bottom of the teacher "quality" distribution to the top -- and, really, that's what merit pay does -- still leaves policymakers with the problem of deciding which students get which teachers.

Issues like these, however, are at the core of any merit pay policy. Sure, pay-for-performance sounds great; it comports nicely with key concepts in economic theory. But when it comes time to implement it in an actual, real-world situation, you've got to confront a whole host of realities that theory doesn't address.

Which is what seems to have happened in Newark:
In 2012, Newark teachers agreed to a controversial new contract that linked their pay to student achievement — a stark departure from the way most teachers across the country are paid. 
The idea was to reward teachers for excellent performance, rather than how many years they spent in the district or degrees they attained. Under the new contract, teachers could earn bonuses and raises only if they received satisfactory or better ratings, and advanced degrees would no longer elevate teachers to a higher pay scale. 
The changes were considered a major victory for the so-called “education reform” movement, which sought to inject corporate-style accountability and compensation practices into public education. And they were championed by an unlikely trio: New Jersey’s Republican governor, the Democratic-aligned leader of the nation’s second-largest teachers union, and Facebook founder Mark Zuckerberg, who had allocated half of his $100 million gift to Newark’s schools to fund a new teachers contract. 
“In my heart, this is what I was hoping for: that Newark would lead a transformational change in education in America,” then-Gov. Chris Christie said in Nov. 2012 after the contract was ratified. 
Seven years later, those changes have been erased. 
Last week, negotiators for the Newark Teachers Union and the district struck a deal for a new contract that scraps the bonuses for top-rated teachers, allows low-rated teachers to earn raises, and gives teachers with advanced degrees more pay. It also eliminates other provisions of the 2012 contract, which were continued in a follow-up agreement in 2017, including longer hours for low-performing schools. [emphasis mine]
I blogged about that 2012 contract many times as it was being negotiated. It was never popular; only 37 percent of the membership approved it, thanks to low turnout in the voting. The teachers were promised up to $20 million in extra funding, over three years, dedicated to merit pay, all coming from Mark Zuckerberg's famous donation to Newark's schools. But the actual disbursement was far less: only $1.4 million in the first year. And much of that appears to have come from teachers who were denied regular annual raises due to "poor performance."

More senior teachers with advanced degrees had the option of not participating in merit pay; only about 20% chose to enter the system, putting to rest any idea merit pay was popular among teachers who had a choice. In addition, the percentage of "highly effective" teachers was much higher in the pool of teachers who opted out of merit pay than those who were in the system.

This pretty much destroys the notion that "better" teachers are clamoring for merit pay; in Newark, many doubted the system would work to their benefit. 

This trepidation can be found in a survey of Newark teachers by the American Institutes for Research, which was conducted in the first year of the contract. Over 40 percent of teachers both in and out of the merit pay system believed merit pay would hurt collaboration (p. 25); 60 percent believed the system ignored important aspects of their teaching (p. 24). Nevertheless, over 70 percent believed their pay system was fair and appropriate at their school. My reading of these results is that there was concern over the system, but teachers were willing to give it a shot.

Those of us who follow Newark's schools know what happened next: a mass revolt against the district's administration, which reported directly to the governor at the time, Chris Christie. A mayoral election where state control of schools was the key issue. The return of local control of schools after two decades. And now, the end of Newark's merit pay experiment.

One curious thing about the AIR report: it was labeled as "Year One":
NPS [Newark Public Schools] commissioned American Institutes for Research (AIR) to conduct an evaluation of the implementation and impact of the NPS/NTU contract and associated initiatives. The three-year evaluation focuses on a variety of outcomes (e.g., educator perceptions, teacher retention, teacher effectiveness, and student achievement) associated with the four contract components. In the first year of the evaluation, the period to which this report corresponds, the evaluation team used qualitative and quantitative techniques to assess the implementation of the contract components and to examine the association between the new evaluation and compensation systems (i.e., Components 1 and 2) and teacher retention. This report presents findings related to educator perceptions, as captured by teacher and school leader surveys administered in spring 2015, after two years of contract implementation (i.e., as of the 2014–15 school year) and teacher retention after one year of contract implementation (i.e., through the 2013–14 school year).The AIR evaluation team plans to examine the contract’s impact on teacher effectiveness and student achievement in 2016 and 2017, respectively. [emphasis mine]
Guess what? There was ever a follow-up study of Newark's merit pay system. For whatever reason, AIR never published any further reports on whether merit pay helped improve student learning, improved effective employee retention, or improved the quality of new teaching candidates in NPS.

You would think, given all the hoopla over this contract at the time, that a thorough study of how merit pay played out in Newark would have been a top priority for the state, the district, and all of the folks who were connected to the Zuckerberg donation. Alas, we'll never know how this hyped contract affected the district. We'll never know if merit pay in Newark -- perhaps the most high-profile implementation of teacher pay-for-performance in the United States -- actually worked.

Were I cynical, I'd think that the folks behind this system didn't really want to know whether merit pay works. I'd think they want to keep merit pay a policy based on theory, not evidence. Because if Newark turned out like all the other merit pay experiments, it would show, at best, practically small improvements in student outcomes. And the price for that tiny improvement would be a chaotic and impractical system of teacher evaluation and compensation that was always doomed to fail.

Good thing I'm not cynical...

For those of you who weren't with me during the early, snarkier days of this blog: the idea for the Merit Pay Fairy came from a scene in a play by Christopher Durang:
“You remember how in the second act Tinkerbell drinks some poison that Peter is about to drink in order to save him? And then Peter turns to the audience and he says that ‘Tinkerbell is going to die because not enough people believe in fairies. But if all of you clap your hands real hard to show that you do believe in fairies, maybe she won’t die.’ So, we all started to clap. I clapped so long and so hard that my palms hurt and they even started to bleed I clapped so hard. Then suddenly the actress playing Peter Pan turned to the audience and she said, ‘That wasn’t enough. You did not clap hard enough. Tinkerbell is dead.’ And then we all started to cry.”
It really doesn't matter how many deaths the Merit Pay Fairy dies -- some folks just keep clapping harder:
Shavar Jeffries, who led the Newark school board in 2012 and is now president of the national advocacy group Democrats for Education Reform said he is happy to see teachers get more money under the new agreement. But he said it is disappointing that teachers’ performance will no longer automatically influence their pay — a disconnect, he argued, that many families do not support.
“There’s almost no parent in the city of Newark,” he said, “who thinks that there shouldn’t be a relationship between pay and whether you’re actually doing a good job for babies each and every day in the classroom.”
I don't live in Newark, but I think I'm safe in making this statement: there is definitely no parent in Newark who doesn't want a good teacher for their own kid. The question, then, is what are we doing to ensure that every child in Newark -- and, for that matter, every community -- has a well-trained, competent, effective teacher in their classroom.

Taking away money from "bad" teachers and giving it to "good" ones does little to improve the overall effectiveness of the teaching corps. What would help is raising the base pay for teachers so as to close the compensation gap they suffer compared to other professions; that way, we could attract the best possible candidates into the profession and increase the chances that every child has an effective teacher.

What won't help is imposing unfeasible pay-for-performance systems that, time after time, fail to deliver meaningful improvements. Clap as hard as you want -- the Merit Pay Fairy is dead in Newark, and the prognosis in other communities is not good.

* I'm adding this note in anticipation of a counter-argument for merit pay I've seen before: if you include studies in other countries, the effect size of merit pay grows. But these other countries have such different contexts for teaching and worker pay that applying their results to the U.S. is a highly dubious proposition. Further, the effect size is still very small -- again, even under the most generous interpretation of the results. When effects are so small, we are justified in questioning whether the variation is related to the construct; in other words, are there real improvements in learning, or are teachers just slightly pumping up scores through test prep?

No, I'm not going to debate this on Twitter. Write an article or blog post and I'll respond.

Thursday, July 25, 2019

What's Really Happening In Camden's Schools?

This latest series on Camden's schools is in three parts:

Part I

Part II

Part III (this post)

I want to wrap up this series of posts about Camden's schools with a look at the latest CREDO report, which the supporters of recent "reforms" keep citing as proof of those reforms' success.

Long time readers know the CREDO reports, issued by the Center for Research on Education Outcomes at Stanford University, have been perhaps the best known of all research studies on the effectiveness of charter schools. The reports, which are not peer-reviewed, look at the differences in growth in test scores between charter schools and public district schools, or between different school operators within the charter sector. CREDO often issues reports for a particular city's or state's charter sector; they last produced a statewide report for New Jersey in 2013.

I and others have written a great deal over the years about the inherent limitations and flaws in CREDO's methodology. A quick summary:

-- The CREDO reports rely on data that is too crude to do the job properly. At the heart of CREDOs methodology is their supposed ability to virtually "match" students who do and don't attend charter schools, and compare their progress. The match is made on two factors: first, student characteristics, including whether students qualify for free lunch, whether they are classified as English language learners (in New Jersey, the designation is "LEP," or "limited English proficient"), whether they have a special education disability, race/ethnicity, and gender.

The problem is that these classifications are not finely-grained enough to make a useful match. There is, for example, a huge difference between a student who is emotionally disturbed and one who has a speech impairment; yet both would be "matched" as having a special education need. In a city like Camden, where childhood poverty is extremely high, nearly all children qualify for free or reduced-price lunch (FRPL), which requires a family income below 185 percent of the poverty line. Yet there is a world of difference between a child just below that line and a child who is homeless. If charter schools enroll more students at the upper end of this range -- and there is evidence that in at least some instances they do -- the estimates of the effect of charter schools on student learning growth very likely will be overstated.

-- CREDO's use of test scores to match students and measure outcome differences is inherently problematic. The second factor on which CREDO makes student matches is previous student test scores. Using these as a match is always problematic, as test outcomes are prone to statistical noise. For now, I'll set aside some of the more technical issues with CREDO's methods and simply note that all tests are subject to construct-irrelevant variance, a fancy way of saying that scores can rise not because students are better readers or mathematicians, but simply because they are better test takers. If a charter school focuses heavily on test prep -- and we know many of the best-known ones do -- they can pump up effect sizes without increasing student learning in a way that we would consider meaningful.

-- The CREDO reports translate charter school effects into a "days of learning" measure that is wholly unvalidated. I've been going on about this for years: there is simply no credible evidence to support CREDO when they make a claim about a charter school's students showing "x number of days more learning" than a public district school's students. When you follow CREDO's citations back to their original sources, you find they are making this translation based on nothing. It's no wonder laypersons with little knowledge of testing often misinterpret CREDO's results.

Again, I and others have been writing about these limits of the CREDO studies for years. But the Camden "study" has some additional problems:

First, it's not really a "study" -- it's a Powerpoint slideshow that is missing some essential elements that should be included in any credible piece of research. Foremost of these is a description of the variables. In previous reports, CREDO at least told its readers how the percentages of FRPL, special education, and LEP students varied between charter and public district schools. But they don't even bother with this basic analysis here. And it's important: if the charter sector is taking proportionally fewer of the students who are more challenging to educate, they may be creating a peer effect that can't be scaled up.

Second, free and reduced-price lunch (FRPL) is an even less accurate measure of student socio-economic status when a district has universal enrollment in the school meal program. If a student's family knows she will automatically receive a free meal, they will have less incentive to fill out an application. We've seen significant declines in the percentage of FRPL students in some districts that have moved to universal enrollment, indicating this is a real phenomenon.

In contrast, New Jersey charter schools get more funding when they enroll a FRPL-eligible student. They have an incentive to get a student's family to fill out an application that the district does not. Did CREDO account for this? They don't say.

Third, the switch in tests in 2015 complicates any test-based analysis. There are at least two reasons for this: first, the previous test scores of students in both the charter and public district groups go back to 2014, when the NJASK was the test. As Bruce Baker and I showed in our analysis of Newark's schools, there is evidence that schools made a sudden shift in their relative standing on test outcomes when the switch in tests occurred. Shifts this fast are almost certainly not due to changes in student instruction; instead, they occur because some students were more familiar with the new format of the test than others. This, again, puts the matching process in doubt.

The second reason is related to the first: some schools likely took longer to acclimate to the new test than others. Their relative growth in outcomes, therefore, will probably shift in later years. Did that happen in Camden? Let's look at some of the CREDO report's results:

As I explained in earlier posts, there are three types of publicly-funded schools in Camden: Camden City Public Schools, denoted here as "TPS" for "Traditional Public Schools"*: independent charter schools; and renaissance schools, which are operated by charter management organizations (CMOs), and are supposed to take all students in a neighborhood catchment zone (but don't -- hang on...).

Note here that CCPS schools made a leap in relative growth between 2015 and 2016. Do you think it's because the schools got so much better in one year? Or is it more logical to believe something else is going on? A more likely explanation is that Camden students were not well prepared for the change in test format in 2015, but then became more familiar with it in 2016. It's also possible the charter students were better prepared for the new test in 2015, but lost that advantage in 2016, when their relative growth went down.

This is all speculation... but that's the point. Sudden shifts in test score outcomes are likely due to factors other than better instruction. Making the claim, based on sudden shifts in outcomes, that any particular sector of Camden's school system is getting better results due to their practices is a huge leap -- especially when we now know something important about the renaissance schools...

Because renaissance schools are not enrolling all students in their neighborhoods, their students are different from CCPS students in ways that can't be captured by the data. Here, again, is the State Auditor in his renaissance school report:
  • The current enrollment process has limited the participation of neighborhood students in renaissance schools. Per N.J.S.A. 18A:36C-8, renaissance schools shall automatically enroll all students residing in the neighborhood of a renaissance school. Instead, the district implemented a centralized enrollment system in which families must opt in if they prefer to attend a renaissance school. This process has left the district with fewer than half of neighborhood students being enrolled in their neighborhood renaissance school. 
  • [...]
  • The current policy could result in a higher concentration of students with actively involved parents or guardians being enrolled in renaissance schools. Their involvement is generally regarded as a key indicator of a student’s academic success, therefore differences in academic outcomes between district and renaissance students may not be a fair comparison.
As I said in the last post, it is very frustrating that the State Auditor gets this, but people who proclaim to have expertise in education policy do not. Let me state this as simply as I can:

A "study" like the Camden CREDO report attempts to compare similar students in charters and public district schools by matching students based on crude variables. Again, these variables aren't up to the job -- but just as important, students can't be matched on unmeasured characteristics like parental involvement. Which means the results of the Camden CREDO report must be taken with great caution.

And again: when outcomes suddenly shift from year-to-year, there's even more reason to suspect the effects of charter and renaissance schools are not due to factors such as better instruction.

One more thing: any positive effects found in the CREDO study are a fraction of what is needed to close the opportunity gap with students in more affluent communities. There is simply no basis to believe that anything the charter or renaissance schools are doing will make up for the effects of chronic poverty, segregation, and institutional racism from which Camden students suffer.

Now, there are some very powerful political forces in New Jersey that do not want to acknowledge what I am saying here. They want the state's residents and lawmakers to believe that the state takeover of Camden's schools, and subsequent privatization of many of those schools, has led to demonstrably better student outcomes -- so much better that upending democratic, local control of Camden's schools was worth it.

Remember: the takeover and privatization of Camden's schools was planned without any meaningful local input. From 2012:
CAMDEN — A secret Department of Education proposal called for the state to intervene in the city’s school district by July 1, closing up to 13 city and charter schools. 
The intervention proposal, which was obtained by the Courier-Post, was written by Department of Education employee Bing Howell. 
He did not respond to a phone call and email seeking comment. 
Howell serves as a liaison to Camden for the creation of four Urban Hope Act charter schools. He reports directly to the deputy commissioner of education, Andy Smerick.
Howell’s proposal suggests that he oversee the intervention through portfolio management — providing a range of school options with the state, not the district, overseeing the options. He would be assisted by Rochelle Sinclair, another DOE employee. Both Howell and Sinclair are fellows of the Los Angeles-based Broad Foundation. [emphasis mine]
A California billionaire paid for the development of a "secret" (that's the local newspaper's word, not mine) proposal to wrest democratic, local control of schools away from Camden and develop a "portfolio" of charter, renaissance, private, and public schools. This proposal fit in nicely with the plans for Camden's redevelopment, which, as we are now learning, included a series of massive tax breaks for corporations with ties to the South Jersey Democratic machine.

The same forces that are now trying to justify this tax giveaway are the same forces that pushed forward a radical transformation of Camden's schools. They would have us all believe that this transformation is as "successful" as their tax schemes.

But in both cases they are relying on the flimsiest of evidence, badly interpreted and devoid of any meaningful context. The case for educational "reform" in Camden is as weak as the case for corporate tax incentives in Camden.

Camden's families deserve what so many suburban families in New Jersey have: adequately funded and democratically, locally controlled schools. Small, dubious bumps in student growth found in incomplete "studies" are not an acceptable substitute.

That's all, for now, about Camden. We'll move on to another state next...

* I don't use "TPS" because I think it's a loaded term: the word "traditional" can carry all sorts of unwarranted negative connotations. CCPS schools are properly defined as "public district schools."

Tuesday, July 23, 2019

Camden, Charter Schools, and a Very Big Lie

This latest series on Camden's schools is in three parts:

Part I

Part II (this post)

Part III 

Let's get back to the deeply flawed editorial from this week's Star-Ledger that I wrote about yesterday. In that post, I explained how "creaming" -- the practice of taking only those students who are likely to score high on standardized tests -- is likely a major contributor to the "success" of certain charter schools.

Charter school advocates do not like discussing this issue. The charter brand is based on the notion that certain operators have discovered some special method for getting better educational outcomes from students -- particularly students who are in disadvantaged communities -- than public district schools. But if they are creaming the higher-performing kids, there's probably nothing all that special about charters after all.

It's important to understand this debate about charters and creaming if you want to understand what's happening now in Camden's schools.

Because Camden was going to be the proof point that finally showed the creaming naysayers were wrong with a new hybrid model of schooling: the renaissance school. These schools would be run by the same organizations that managed charter schools in Newark and Philadelphia. The district would turn over dilapidated school properties to charter management organizations (CMOs); they would, in turn, renovate the facilities, using funds the district claimed it didn't have and would never get.

But most importantly: these schools would be required to take all of the children within the school's neighborhood (formally defined as its "catchment"). Creaming couldn't occur, because everyone from the neighborhood would be admitted to the school. Charter schools would finally prove that they did, indeed, have a formula for success that could be replicated for all children.

Well, guess what?
CITY OF CAMDEN SCHOOL DISTRICT July 1, 2015 to February 28, 2018 
  • The current enrollment process has limited the participation of neighborhood students in renaissance schools. Per N.J.S.A. 18A:36C-8, renaissance schools shall automatically enroll all students residing in the neighborhood of a renaissance school. Instead, the district implemented a centralized enrollment system in which families must opt in if they prefer to attend a renaissance school. This process has left the district with fewer than half of neighborhood students being enrolled in their neighborhood renaissance school.
That's from a report from the State Auditor that was released earlier this year -- a report ignored by many in the NJ the press, including the Star-Ledger (the Courier Post and NJ Spotlight ran stories on the lack of oversight for renaissance schools, but didn't address the problems with the neighborhood enrollments).

Understand, the SL played a pivotal role in spreading the news that renaissance schools would enroll every student within their catchments. Here, for example, is an editorial from 2012 [all emphases mine]:
The campus will grow one grade level at a time, serving every kid in the neighborhood — including those learning English, or with special needs.
In real time, only snarky teacher-bloggers expressed any skepticism. But the SL continued to assure Camden's families that the renaissance schools would accept all students in the neighborhood; here's a piece from 2014:
District officials said the renaissance schools serve specific neighborhoods, where all students within that neighborhood are guaranteed enrollment.
According to the district, renaissance schools differ from conventional charter schools in that they guarantee a seat to every student living in its local neighborhood, and that they contract with the local school district.
This exact phrasing was used in an SL piece just a month later; apparently, the newspaper couldn't come up with new ways to assure residents every local student would have a seat. Here's yet another piece from 2017, where the SL gave South Jersey political boss George Norcross space to assure Camden's parents that every neighborhood child would get a seat at their renaissance school:
Renaissance schools are neighborhood schools that serve students in a defined catchment area, guaranteeing enrollment for any student living in that neighborhood. In other words, a child's fate is not left to a lottery.
Now, if anyone at the SL had read the Urban Hope Act, which created renaissance schools, they'd know what Norcross wrote in the pages off their newspaper simply wasn't true:
  1. If there are more students in the attendance area than seats in the renaissance school, the renaissance school shall determine enrollment by a lottery for students residing in the attendance area. In developing and executing its selection process, the nonprofit entity shall not discriminate on the basis of intellectual or athletic ability, measures of achievement or aptitude, status as a handicapped person, proficiency in the English language, or any other basis that would be illegal if used by a school district.
This directly contradicts Norcross, and the repeated reports in the SL. But, hey, what did the newspaper know back in 2017, before the Auditor's report? Maybe they thought it was a good idea to give a powerful political figure the benefit of the doubt; maybe every neighborhood kid really was getting into a renaissance school, no matter what the actual law said.

But then, in 2019, the Auditor's report was released, and all doubt was erased: the renaissance schools were not enrolling all neighborhood students. The previous reporting was false. How embarrassing...

Surely, from now on when the SL writes about renaissance schools, they will acknowledge the promise of a guaranteed seat for all students within those schools' catchments was broken. Surely, they will admit their previous reporting was inaccurate, and apologize for getting the story wrong. If not that, at least they will demand to know why the promises the district and the state made to Camden's families were now being broken.

Won't they?
South Jersey political boss George Norcross also deserves credit for using his political weight to push these reforms in Camden. Just because he’s defending a corrupt tax incentives program doesn’t mean he’s not doing good elsewhere. He helped push through a new law that allowed nonprofit charter operators to run neighborhood schools, but also forced them serve every student who walks through the door.
Technically, that's true -- the problem is that not every student from the neighborhood -- who were all promised a seat -- is allowed to walk through the door of their local renaissance school.

Again, from the Auditor's report:
In the 201617 enrollment lottery, 461 students were accepted to renaissance schools. Of these students, 247 (54 percent) resided in the neighborhood of their renaissance school. In the 201718 enrollment lottery, 838 students were accepted to renaissance schools. Of these students, 387 (46 percent) resided in the neighborhood of their renaissance school. Overall, less than half of students accepted to renaissance schools (49 percent) through the enrollment lottery process for the 201617 and 2017–18 school years were from the renaissance school’s neighborhood. 
All neighborhood students who submitted applications by the deadline for the 201617 lottery were accepted in their neighborhood renaissance school; however, 47 students who applied by the deadline for the 2017–18 lottery had to be placed on their neighborhood renaissance school’s wait list. As of October 2017, there were 195 students on the wait list for their neighborhood renaissance school.[emphasis mine]
 The Auditor also explains why this matters:
The current policy could result in a higher concentration of students with actively involved parents or guardians being enrolled in renaissance schools. Their involvement is generally regarded as a key indicator of a student’s academic success, therefore differences in academic outcomes between district and renaissance students may not be a fair comparison.
This is a reality some of us have been trying to explain to outlets like the Star-Ledger for years. But for whatever reason, it appears the paper would rather use the weaseliest of words than admit they've been wrong all along. From last week's editorial:
This addressed a common knock on charters: that they self-select their students, by keeping out the poorest kids or those with special needs.
Those typical criticisms don’t apply in Camden. The so-called “renaissance schools” under charter management take the same, or more of the poorest and special ed kids as the district schools. 
See how they've moved the goalposts? Before, every kid in the neighborhood got a seat; now, the kids are the same...

Except, as the Auditor points, out, it's likely they aren't. The very act of enrolling your child in a renaissance school is likely a marker that you are a more "actively involved parent." We know, thanks to a great deal of high-quality research (see the lit review here) that parents rely on their social networks to help them make decisions in school "choice" systems, and that different parents have different networks. It's not at all a stretch to think the students in renaissance schools differ from other students on characteristics that can't be shown in the data. 

In other words: the renaissance schools may very well be creaming. Why is the State Auditor capable of getting this simple point, but the Star-Ledger editorial board isn't?

I'll talk more about these "unobserved" student differences and why they matter in my next post. For now, we need to understand this:

When the people of Camden were told that every child in a renaissance school's catchment would be enrolled, they were lied to. I'm using the passive voice deliberately here because who exactly did the lying -- and who simply transmitted this very big lie -- is open to debate.

But I would think that journalists -- whose primary function is to deliver the truth to their readers -- would, of all people, not want to perpetuate falsehoods when confronted with the facts. How sad that New Jersey's largest newspaper has such low standards, and such little regard for their readers.

We'll talk about the latest "study" on Camden schools' effectiveness next.

 Star-Ledger Editorial Board

ADDING: Over the years, the Star-Ledger opinion section has been remarkably inept when it comes to writing about education:

  • They blamed teacher seniority when an award-winning teacher in Camden was fired -- except she never was.
  • They tried to show the failure of Camden's schools by pointing to the low proficiency rate at Camden Street School -- expect that school was in Newark, and hosted programs for that district's most cognitively impaired students.
  • They said a group of Newark teachers told "lies" about a contract negotiation -- except what those teachers actually said was, in fact, accurate.
  • They gave an anti-tenure superintendent space to tell stories about her staff -- except her own board said they weren't true (she was later terminated by that same board).
  • They misrepresented the views of union leaders -- even when those leaders were quite clear in their answers to direct questions.
  • They engaged in some particularly nasty language when describing the grassroots opposition to school leadership in Newark -- including making the accusation that local activists "don't seem to give a damn about the children."
  • They made fun of a union official's weight. Yes, they did.

Let me be clear about something: over the years, the Star-Ledger has had some excellent reporters on the education beat, including Jessica Calefati, Peggy McGlone, and Adam Clark. And, of course, the great Bob Braun worked there for years.

But the opinion section has been, and remains, a mess. If you're a public school teacher and you pay to read this dreck, you should really ask yourself: "Why?"

Monday, July 22, 2019

How Student "Creaming" Works

This latest series on Camden's schools is in three parts:

Part I (this post)

Part II

Part III

There is, as usual, so much wrong in this Star-Ledger editorial on Camden's schools that it will probably take several posts for me to correct all of its mistakes. But there's one assertion, right at the very top, that folks have been making recently about Newark's schools that needs to be corrected immediately:
Last year, for the first time ever, the low-income, mostly minority kids in Newark charter schools beat the state’s average scores in reading and math in grades 3-8 – incredible, given the far more affluent pool of kids they were competing against.
This is yet another example, like previous ones, of a talking point that is factually correct but utterly meaningless for evaluating the effectiveness of education policies like charter schooling. It betrays a fundamental misunderstanding of test scores and student characteristics, which keeps the people who make statements like this from having to answer the questions that really matter.

The question in this case is: Do "successful" urban charter schools get their higher test scores, at least in part, by "creaming" students?

Creaming has become a central issue in the whole debate about the effectiveness of charters. A school "creams" when it enrolls students who are more likely to get higher scores on tests due to their personal characteristics and/or their backgrounds. The fact that Newark's charter schools enroll, as a group, fewer students with special education needs -- particularly high-cost needs -- and many fewer students who are English language learners is an indication that creaming may be in play.

The quote above, however, doesn't address this possibility. The SL's editors argue instead that these schools' practices have caused the disadvantaged children in Newark's charters to "beat" the scores of children who aren't disadvantaged. And because the students in Newark's charters are "beating the state's average scores," they must be "incredible."

Last month, I wrote about some very important context specific to Newark that has to be addressed when making such a claim. But let's set that aside and get to a more fundamental question: given the concerns about creaming, is the SL's argument -- that charter students "beat" the state average -- a valid way to assess these schools' effectiveness?

No. It is not.

Let's go through this diagram one step at a time. The first point we have to acknowledge is that test scores, by design, yield a distribution of scores. That distribution is usually a "bell curve": a few students score high, a few score low, and most score in the middle.

This is the distribution of all test takers. But you could also pull out a subpopulation of students, based on any number of characteristics: race, gender, socio-economic status, and so on. Unless you delineate the subpopulation specifically on test scores, you're almost certainly going to get another distribution of scores.

Think of a school in a relatively affluent suburb, where none of the students qualify for free-lunch (the standard measure of socio-economic status in educational research). Think of all the students in that school. Their test scores will vary considerably -- even if the school scores high, on average, compared to less-affluent schools. Some of the kids will have a natural affinity for doing well on tests; some won't. Some will have parents who place a high value on scoring well on tests; some parents will place less value on scoring well. The students will have variations in their backgrounds and personal characteristics that we can't see in the crude variables collected in the data; consequently, their scores will vary.

The important point is that there will be a range of scores in this school. Intuitively, most people will understand this. But can they make the next leap? Can they understand that there will also be a range of scores in a lower-performing school?

There is, in my opinion, a tendency for pundits who opine on education to sometimes see children in disadvantaged communities as an undifferentiated mass. They seem not to understand that the variation in unmeasured student characteristics can be just a great in a school located in a disadvantaged community as it is in an affluent community; consequently, the test scores in less-affluent schools will also vary.

The children enrolled in Newark's schools will have backgrounds and personal characteristics that vary widely. Some will be more comfortable with tests than others. Some will have parents who value scoring well on tests more than others. It is certainly possible that the variation in a disadvantaged school -- the shape of the bell curve -- will differ from the variation in affluent schools, but there will be variation.

In my graph above (which is simply for illustrative purposes) I show that the scores of disadvantaged and not-disadvantaged students vary. On average, the disadvantaged students will score lower -- but their scores will still vary. And because the not-disadvantaged students' scores will also vary, it is very likely that there will be some overlap between the two groups. In other words: there will be some relatively high-scoring students who are disadvantaged who will "beat" some relatively low-scoring students who are not disadvantaged.

And here's where the opportunity for creaming arises. If a charter school can find a way to get the kids at the top of the disadvantaged students distribution to enroll -- while leaving the kids in the middle and the bottom of the distribution in the public district schools -- they will likely be able to "beat" the average of all test takers.

Is that what's happening in Newark? Again, the differences in the special education and English language learner rates suggest there is a meaningful difference in the characteristics of the student populations between charters and public district schools. But further opportunities for creaming come from separating students based on unmeasured characteristics.

For example: charter schools require that families apply for admission. It is reasonable to assume that there is a difference between a family that actively seeks to enroll their child in a charter, and a family that does not. Some of the "high-performing" charters in Newark have high suspension and attrition rates; this may send a signal to families that only a certain type of child is a good "fit" for a charter (some charter operators are quite honest about this). These schools also tend to have much longer school days and years; again, this may signal that only students who have the personal characteristics to spend the extra time in class should apply.

There is a very real possibility that these practices have led to creaming -- again, in a way that won't show up in the data. If the creaming is extensive enough -- and is coupled with test-prep instruction and curriculum, more resources, and a longer school day/year -- it wouldn't be too hard for a charter to "beat the state's average scores."

Is this a bad thing? That's an entirely different question. Given the very real segregation in New Jersey's schools, and the regressive slide away from adequate and equitable funding in the last decade, it's hard to find fault with Newark and Camden parents who want to get their children into a "better" school if they can. On the other hand, the fiscal pressures of chartering are real and can affect the entire system of schooling. Further, concentrating certain types of students into certain schools can have unexpected consequences.

A serious discussion of these issues is sorely needed in this state (and elsewhere). Unfortunately, because they refuse to acknowledge some simple realities, the Star-Ledger's editorial board once again fails to live up to that task. I'll get to some other mistakes they make in this piece in a bit.

Star-Ledger Editorial Board

Monday, July 8, 2019

Who Put the "Stakes" In "High-Stakes Testing"?

Peter Green has a smart piece (as usual) about Elizabeth Warren's position on accountability testing. Nancy Flanagan had some smart things to say about it (as usual) on Twitter. Peter's piece and the back-and-forth on social media have got me thinking about testing again -- and when that happens these days, I find myself running back to the testing bible: Standards for Educational and Psychological Testing:
"Evidence of validity, reliability, and fairness for each purpose for which a test is used in a program evaluation, policy study, or accountability system should be collected and made available." (Standard 13.4, p. 210, emphasis mine)
This statement is well worth unpacking, because it dwells right in the heart of the ongoing debate about "high-stakes testing" and, therefore, influences even the current presidential race.

A core principle of psychometrics is that the evaluation of tests can't be separated from the evaluation how their outcomes will be used. As Samuel Messick, one of the key figures in the field, put it:
"Hence, what is to be validated is not the test or observation device as such but the inferences derived from test scores or other indicators -- inferences about score meaning or interpretation and about the implications for action that the interpretation entails." [1] (emphasis mine)
He continues:
"Validity always refers to the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores." [1] (emphasis mine)
I'm highlighting "actions" here because my point is this: You can't fully judge a test without considering what will be done with the results.

To be clear: I'm not saying items on tests, test forms, grading rubrics, scaling procedures, and other aspects of test construction can't and don't vary in quality. Some test questions are bad; test scoring procedures are often highly questionable. But assessing these things is just the start: how we're going to use the results has to be part of the evaluation.

Michael Kane calls on test makers and test users to make an argument to support their proposed uses of test results:
"To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the test scores, and therefore, validation requires a clear statement of the claims inherent in the proposed interpretations and uses of the test scores. Public claims require public justification.
"The argument-based approach to validation (Cronbach, 1988; House, 1980; Kane, 1992, 2006; Shepard, 1993) provides a framework for the evaluation of the claims based on the test scores. The core idea is to state the proposed interpretation and use explicitly, and in some detail, and then to evaluate the plausibility of these proposals." [2]  (emphasis mine)
As I've stated here before: standardized tests, by design, yield a normal or "bell-curve" distribution of scores. Test designers prize variability in scores: they don't want most test takers at the high or low end of the score distribution, because that tells us little about the relative position of those takers. So items are selected, forms are constructed, and scores are scaled such that a few test takers score low, a few score high, and most score in the middle. In a sense, the results are determined first -- then the test is made.

The arguments some folks make about how certain tests are "better" than others often fail to acknowledge this reality. Here in New Jersey, a lot of hoopla surrounded the move from the NJASK to the PARCC; and then later, the change from the PARCC to the NJSLA. But the results of these tests really don't change much.

If you scored high on the old test, you scored high on the new one. So the issue isn't the test itself, because different tests are yielding the same outcomes. What really matters is what you do with these results after you get them. The central issue with "high-stakes testing" isn't the "testing"; it's the "high-stakes." 

So how are we using test scores these days? And how good are the validity arguments for each use?

- Determining an individual student's proficiency. I know I've posted this graphic on the blog dozens of times before, but people seem to respond to it, so...

"Proficiency" is not by any means an objective standard; those in power can set the bar for it pretty much wherever they want. Education officials who operate in good faith will try to bring some reason and order to the process, but it will always be, at its core, subjective.

In the last few years, policymakers decided that schools needed "higher standards"; otherwise, we'd be plagued by "white suburban moms" who were lying to themselves. This stance betrayed a fundamental misunderstanding of what tests are and how they are constructed. Again, test makers like variation in outcomes, which means someone has got to be at the bottom of the distribution. That isn't the same as not being "proficient," because the definition of "proficiency" is fluid. If it isn't, why can policymakers change it on a whim?

I'll admit I've not dug in on this as hard as I could, but I haven't seen a lot of evidence that telling a kid and her family that she is not proficient -- especially after previous tests said she was -- does much to help that kid improve her math or reading skills by itself. If the test spurs some sort of intervention that can yield positive results, that's good. 

But those of us who work with younger kids know that giving feedback to a child about their abilities is tricky business. A test that labels a student as "not proficient" may have unintended negative consequences for that student. A good validity argument for using tests this way should include an exploration of how students themselves will benefit from knowing whether they clear some arbitrary proficiency cut score. Unfortunately, many of the arguments I hear endorsing this use of tests are watery at best.

Still, as far as stakes go, this one isn't nearly as high as...

- Making student promotion or graduation decisions. When a test score determines a student's progression through or exit from the K-12 system, the stakes are much higher; consequently, the validity argument has to be a lot stronger. Whether grade retention based on test scores "works" is constantly debated; we'll save discussion for another time (I do find this evidence to be interesting).

It's the graduation test that I'm more concerned with, especially as I'm from New Jersey and the issue has been a key education policy debate over the past year. Proponents of graduation testing never want to come right out and say this in unambiguous terms, but what they're proposing is withholding a high school diploma -- a critical credential for entry into the workforce -- from high school students who did all their work and passed their courses, yet can't pass a test.

I don't see any validity argument that could possibly justify this action. Again, the tests are set up so someone has to be at the bottom of the distribution; is it fair to deny someone a diploma based on a test that must have low scoring test takers? And no one has put forward a convincing argument that not showing proficiency in the Algebra I exam is somehow a justification for withholding a diploma. A decision this consequential should never be made based on a single test score.

- Employment consequences for teachers. Even if you can make a convincing argument that standardized tests are valid and reliable measures of student achievement, you haven't made the argument that they're measures of teacher effectiveness. A teacher's contribution to a student's test score only explains a small part of the variation in those scores. Teasing out that contribution is a process rife with the potential for error and bias.

If you want to use value-added models or student growth percentiles as signals to alert administrators to check on particular teachers... well, that's one thing. Mandating employment consequences is another. I've yet to see a convincing argument that firing staff solely or largely on the basis of error-prone measures will help much in improving school effectiveness.

- Closing and/or reconstituting schools. It's safe to say that the research on the effects of school closure is, at best, mixed. That said, it's undeniable that students and communities can suffer real damage when their school is closed. Given the potential for harm, the criteria for targeting a school for closure should be based on highly reliable and valid evidence.

Test scores are inevitably part of the evidence -- in fact, many times they're most of the evidence -- deployed in these decisions... and yet their validity as measures of a student's probability of seeing their educational environment improve if a school is closed is almost never questioned by those policymakers who think school closure is a good idea.

Closing a school or converting ii to a charter is a radical step. It should only be attempted if it's clear there is no other option. There's just no way test outcomes, by themselves, give enough information to make that decision. It may well be a school that is "failing" by one measure is actually educating students who started out well behind their peers in other schools. It may be the school is providing valuable supports that can't be measured by standardized tests.

- Evaluating policy interventions. Using test scores to determine the efficacy of particular policy interventions is the bread-and-butter of labor economists and other quant researchers who work in the education field. I rarely see, however, well-considered, fully-formed arguments for the use of test outcomes in this research. More often, there is a simple assumption that the test score is measuring something that can be affected by the intervention; therefore, its use must be valid.

In other words: the argument for using test scores in research is often that they are measuring something: there is signal amid the noise. I don't dispute that, but I also know that the signal is not necessarily indicative of what we really want to measure. Test scores are full of construct-irrelevant variance: they vary because of factors that are other than the ones test-makers are trying to assess. Put another way: a kid may score higher than another not because she is a better reader or mathematician after a particularly intervention, but because she is now a better test-taker.

This is particularly relevant when the effect sizes measured in research are relatively small. We see this all the time, for example, in charter school research: effect sizes of 0.2 or less are commonly referred to as "large" and "meaningful." But when you teach to the test -- often to the exclusion of other parts of the curriculum -- it's not that hard to pump up your test scores a bit relative to those who don't. Daniel Koretz has written extensively on this.

These are only five proposed uses for test scores; there are others. But the initial reason for instituting a high-stakes standardized testing regime was "accountability." Documents from the early days of No Child Left Behind make clear that schools and schools districts were the entities being held accountable. Arguably, so were states -- but primarily to monitor schools and districts. 

I don't think anyone seriously thinks schools and districts -- and the staffs within them -- shouldn't be held accountable for their work. Certainly, taxpayers deserve to know whether their money is being used efficiently and effectively, and parents deserve to know whether their children's schools are doing their job. The question that we seem to have skipped over, however, is whether using standardized tests to dictate actions with high-stakes is a valid use of those tests' outcomes.

Yes, there are people who would do away with standardized testing altogether. But most folks don't seem to have a problem with some level of standardized testing, nor with using test scores as part of an accountability system (although many, like me, would question why it's only the schools and districts that are held accountable, and not the legislators and executives at the state and federal level who consistently fail to provide schools the resources they need for success). 

What they also understand, however -- and on this, the public seems to be ahead of many policymakers and researchers -- is that these are limited measures of school effectiveness, and that we are using them in ways that introduce corrupting pressures, which makes schools worse. That, more than any problem with the tests themselves, seems to be driving the backlashing against high-stakes testing.

As Kane says: "Public claims require public justification." The burden of proof, then, is on those who would use tests to take all sorts of highly consequential actions. Their arguments need to be made clearly and publicly, and they have an obligation to not only demonstrate that the tests themselves are good measures of student learning; they also have to argue convincingly that the results should be used for each separate purpose for which policymakers would use them.

I would suggest to those who question the growing skepticism of high-stakes testing: go back and look at your arguments for their use in undertaking specific actions. Are those arguments as strong as they should be? If they aren't, perhaps you should reconsider the stakes, and not just the test.

[1] Messick, S. (1989). "Validity." In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–100). Washington, DC: American Council on Education.

[2] Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement 50(1), 1–73.