Jersey Jazzman: July 2019

Thursday, July 25, 2019

What's Really Happening In Camden's Schools?

This latest series on Camden's schools is in three parts:

Part I

Part II

Part III (this post)

I want to wrap up this series of posts about Camden's schools with a look at the latest CREDO report, which the supporters of recent "reforms" keep citing as proof of those reforms' success.

Long time readers know the CREDO reports, issued by the Center for Research on Education Outcomes at Stanford University, have been perhaps the best known of all research studies on the effectiveness of charter schools. The reports, which are not peer-reviewed, look at the differences in growth in test scores between charter schools and public district schools, or between different school operators within the charter sector. CREDO often issues reports for a particular city's or state's charter sector; they last produced a statewide report for New Jersey in 2013.

I and others have written a great deal over the years about the inherent limitations and flaws in CREDO's methodology. A quick summary:

-- The CREDO reports rely on data that is too crude to do the job properly. At the heart of CREDOs methodology is their supposed ability to virtually "match" students who do and don't attend charter schools, and compare their progress. The match is made on two factors: first, student characteristics, including whether students qualify for free lunch, whether they are classified as English language learners (in New Jersey, the designation is "LEP," or "limited English proficient"), whether they have a special education disability, race/ethnicity, and gender.

The problem is that these classifications are not finely-grained enough to make a useful match. There is, for example, a huge difference between a student who is emotionally disturbed and one who has a speech impairment; yet both would be "matched" as having a special education need. In a city like Camden, where childhood poverty is extremely high, nearly all children qualify for free or reduced-price lunch (FRPL), which requires a family income below 185 percent of the poverty line. Yet there is a world of difference between a child just below that line and a child who is homeless. If charter schools enroll more students at the upper end of this range -- and there is evidence that in at least some instances they do -- the estimates of the effect of charter schools on student learning growth very likely will be overstated.

-- CREDO's use of test scores to match students and measure outcome differences is inherently problematic. The second factor on which CREDO makes student matches is previous student test scores. Using these as a match is always problematic, as test outcomes are prone to statistical noise. For now, I'll set aside some of the more technical issues with CREDO's methods and simply note that all tests are subject to construct-irrelevant variance, a fancy way of saying that scores can rise not because students are better readers or mathematicians, but simply because they are better test takers. If a charter school focuses heavily on test prep -- and we know many of the best-known ones do -- they can pump up effect sizes without increasing student learning in a way that we would consider meaningful.

-- The CREDO reports translate charter school effects into a "days of learning" measure that is wholly unvalidated. I've been going on about this for years: there is simply no credible evidence to support CREDO when they make a claim about a charter school's students showing "x number of days more learning" than a public district school's students. When you follow CREDO's citations back to their original sources, you find they are making this translation based on nothing. It's no wonder laypersons with little knowledge of testing often misinterpret CREDO's results.

Again, I and others have been writing about these limits of the CREDO studies for years. But the Camden "study" has some additional problems:

First, it's not really a "study" -- it's a Powerpoint slideshow that is missing some essential elements that should be included in any credible piece of research. Foremost of these is a description of the variables. In previous reports, CREDO at least told its readers how the percentages of FRPL, special education, and LEP students varied between charter and public district schools. But they don't even bother with this basic analysis here. And it's important: if the charter sector is taking proportionally fewer of the students who are more challenging to educate, they may be creating a peer effect that can't be scaled up.

Second, free and reduced-price lunch (FRPL) is an even less accurate measure of student socio-economic status when a district has universal enrollment in the school meal program. If a student's family knows she will automatically receive a free meal, they will have less incentive to fill out an application. We've seen significant declines in the percentage of FRPL students in some districts that have moved to universal enrollment, indicating this is a real phenomenon.

In contrast, New Jersey charter schools get more funding when they enroll a FRPL-eligible student. They have an incentive to get a student's family to fill out an application that the district does not. Did CREDO account for this? They don't say.

Third, the switch in tests in 2015 complicates any test-based analysis. There are at least two reasons for this: first, the previous test scores of students in both the charter and public district groups go back to 2014, when the NJASK was the test. As Bruce Baker and I showed in our analysis of Newark's schools, there is evidence that schools made a sudden shift in their relative standing on test outcomes when the switch in tests occurred. Shifts this fast are almost certainly not due to changes in student instruction; instead, they occur because some students were more familiar with the new format of the test than others. This, again, puts the matching process in doubt.

The second reason is related to the first: some schools likely took longer to acclimate to the new test than others. Their relative growth in outcomes, therefore, will probably shift in later years. Did that happen in Camden? Let's look at some of the CREDO report's results:

As I explained in earlier posts, there are three types of publicly-funded schools in Camden: Camden City Public Schools, denoted here as "TPS" for "Traditional Public Schools"*: independent charter schools; and renaissance schools, which are operated by charter management organizations (CMOs), and are supposed to take all students in a neighborhood catchment zone (but don't -- hang on...).

Note here that CCPS schools made a leap in relative growth between 2015 and 2016. Do you think it's because the schools got so much better in one year? Or is it more logical to believe something else is going on? A more likely explanation is that Camden students were not well prepared for the change in test format in 2015, but then became more familiar with it in 2016. It's also possible the charter students were better prepared for the new test in 2015, but lost that advantage in 2016, when their relative growth went down.

This is all speculation... but that's the point. Sudden shifts in test score outcomes are likely due to factors other than better instruction. Making the claim, based on sudden shifts in outcomes, that any particular sector of Camden's school system is getting better results due to their practices is a huge leap -- especially when we now know something important about the renaissance schools...

Because renaissance schools are not enrolling all students in their neighborhoods, their students are different from CCPS students in ways that can't be captured by the data. Here, again, is the State Auditor in his renaissance school report:

The current enrollment process has limited the participation of neighborhood students in renaissance schools. Per N.J.S.A. 18A:36C-8, renaissance schools shall automatically enroll all students residing in the neighborhood of a renaissance school. Instead, the district implemented a centralized enrollment system in which families must opt in if they prefer to attend a renaissance school. This process has left the district with fewer than half of neighborhood students being enrolled in their neighborhood renaissance school.

[...]

The current policy could result in a higher concentration of students with actively involved parents or guardians being enrolled in renaissance schools. Their involvement is generally regarded as a key indicator of a student’s academic success, therefore differences in academic outcomes between district and renaissance students may not be a fair comparison.

As I said in the last post, it is very frustrating that the State Auditor gets this, but people who proclaim to have expertise in education policy do not. Let me state this as simply as I can:

A "study" like the Camden CREDO report attempts to compare similar students in charters and public district schools by matching students based on crude variables. Again, these variables aren't up to the job -- but just as important, students can't be matched on unmeasured characteristics like parental involvement. Which means the results of the Camden CREDO report must be taken with great caution.

And again: when outcomes suddenly shift from year-to-year, there's even more reason to suspect the effects of charter and renaissance schools are not due to factors such as better instruction.

One more thing: any positive effects found in the CREDO study are a fraction of what is needed to close the opportunity gap with students in more affluent communities. There is simply no basis to believe that anything the charter or renaissance schools are doing will make up for the effects of chronic poverty, segregation, and institutional racism from which Camden students suffer.

Now, there are some very powerful political forces in New Jersey that do not want to acknowledge what I am saying here. They want the state's residents and lawmakers to believe that the state takeover of Camden's schools, and subsequent privatization of many of those schools, has led to demonstrably better student outcomes -- so much better that upending democratic, local control of Camden's schools was worth it.

Remember: the takeover and privatization of Camden's schools was planned without any meaningful local input. From 2012:

CAMDEN — A secret Department of Education proposal called for the state to intervene in the city’s school district by July 1, closing up to 13 city and charter schools.

[...]

The intervention proposal, which was obtained by the Courier-Post, was written by Department of Education employee Bing Howell.

He did not respond to a phone call and email seeking comment.

Howell serves as a liaison to Camden for the creation of four Urban Hope Act charter schools. He reports directly to the deputy commissioner of education, Andy Smerick.

Howell’s proposal suggests that he oversee the intervention through portfolio management — providing a range of school options with the state, not the district, overseeing the options. He would be assisted by Rochelle Sinclair, another DOE employee. Both Howell and Sinclair are fellows of the Los Angeles-based Broad Foundation. [emphasis mine]

A California billionaire paid for the development of a "secret" (that's the local newspaper's word, not mine) proposal to wrest democratic, local control of schools away from Camden and develop a "portfolio" of charter, renaissance, private, and public schools. This proposal fit in nicely with the plans for Camden's redevelopment, which, as we are now learning, included a series of massive tax breaks for corporations with ties to the South Jersey Democratic machine.

The same forces that are now trying to justify this tax giveaway are the same forces that pushed forward a radical transformation of Camden's schools. They would have us all believe that this transformation is as "successful" as their tax schemes.

But in both cases they are relying on the flimsiest of evidence, badly interpreted and devoid of any meaningful context. The case for educational "reform" in Camden is as weak as the case for corporate tax incentives in Camden.

Camden's families deserve what so many suburban families in New Jersey have: adequately funded and democratically, locally controlled schools. Small, dubious bumps in student growth found in incomplete "studies" are not an acceptable substitute.

That's all, for now, about Camden. We'll move on to another state next...

* I don't use "TPS" because I think it's a loaded term: the word "traditional" can carry all sorts of unwarranted negative connotations. CCPS schools are properly defined as "public district schools."

Tuesday, July 23, 2019

Camden, Charter Schools, and a Very Big Lie

This latest series on Camden's schools is in three parts:

Part I

Part II (this post)

Part III

Let's get back to the deeply flawed editorial from this week's Star-Ledger that I wrote about yesterday. In that post, I explained how "creaming" -- the practice of taking only those students who are likely to score high on standardized tests -- is likely a major contributor to the "success" of certain charter schools.

Charter school advocates do not like discussing this issue. The charter brand is based on the notion that certain operators have discovered some special method for getting better educational outcomes from students -- particularly students who are in disadvantaged communities -- than public district schools. But if they are creaming the higher-performing kids, there's probably nothing all that special about charters after all.

It's important to understand this debate about charters and creaming if you want to understand what's happening now in Camden's schools.

Because Camden was going to be the proof point that finally showed the creaming naysayers were wrong with a new hybrid model of schooling: the renaissance school. These schools would be run by the same organizations that managed charter schools in Newark and Philadelphia. The district would turn over dilapidated school properties to charter management organizations (CMOs); they would, in turn, renovate the facilities, using funds the district claimed it didn't have and would never get.

But most importantly: these schools would be required to take all of the children within the school's neighborhood (formally defined as its "catchment"). Creaming couldn't occur, because everyone from the neighborhood would be admitted to the school. Charter schools would finally prove that they did, indeed, have a formula for success that could be replicated for all children.

Well, guess what?

EXECUTIVE SUMMARY
CITY OF CAMDEN SCHOOL DISTRICT July 1, 2015 to February 28, 2018

[...]

The current enrollment process has limited the participation of neighborhood students in renaissance schools. Per N.J.S.A. 18A:36C-8, renaissance schools shall automatically enroll all students residing in the neighborhood of a renaissance school. Instead, the district implemented a centralized enrollment system in which families must opt in if they prefer to attend a renaissance school. This process has left the district with fewer than half of neighborhood students being enrolled in their neighborhood renaissance school.

That's from a report from the State Auditor that was released earlier this year -- a report ignored by many in the NJ the press, including the Star-Ledger (the Courier Post and NJ Spotlight ran stories on the lack of oversight for renaissance schools, but didn't address the problems with the neighborhood enrollments).

Understand, the SL played a pivotal role in spreading the news that renaissance schools would enroll every student within their catchments. Here, for example, is an editorial from 2012 [all emphases mine]:

The campus will grow one grade level at a time, serving every kid in the neighborhood — including those learning English, or with special needs.

In real time, only snarky teacher-bloggers expressed any skepticism. But the SL continued to assure Camden's families that the renaissance schools would accept all students in the neighborhood; here's a piece from 2014:

District officials said the renaissance schools serve specific neighborhoods, where all students within that neighborhood are guaranteed enrollment.

2015:

According to the district, renaissance schools differ from conventional charter schools in that they guarantee a seat to every student living in its local neighborhood, and that they contract with the local school district.

This exact phrasing was used in an SL piece just a month later; apparently, the newspaper couldn't come up with new ways to assure residents every local student would have a seat. Here's yet another piece from 2017, where the SL gave South Jersey political boss George Norcross space to assure Camden's parents that every neighborhood child would get a seat at their renaissance school:

Renaissance schools are neighborhood schools that serve students in a defined catchment area, guaranteeing enrollment for any student living in that neighborhood. In other words, a child's fate is not left to a lottery.

Now, if anyone at the SL had read the Urban Hope Act, which created renaissance schools, they'd know what Norcross wrote in the pages off their newspaper simply wasn't true:

If there are more students in the attendance area than seats in the renaissance school, the renaissance school shall determine enrollment by a lottery for students residing in the attendance area. In developing and executing its selection process, the nonprofit entity shall not discriminate on the basis of intellectual or athletic ability, measures of achievement or aptitude, status as a handicapped person, proficiency in the English language, or any other basis that would be illegal if used by a school district.

This directly contradicts Norcross, and the repeated reports in the SL. But, hey, what did the newspaper know back in 2017, before the Auditor's report? Maybe they thought it was a good idea to give a powerful political figure the benefit of the doubt; maybe every neighborhood kid really was getting into a renaissance school, no matter what the actual law said.

But then, in 2019, the Auditor's report was released, and all doubt was erased: the renaissance schools were not enrolling all neighborhood students. The previous reporting was false. How embarrassing...

Surely, from now on when the SL writes about renaissance schools, they will acknowledge the promise of a guaranteed seat for all students within those schools' catchments was broken. Surely, they will admit their previous reporting was inaccurate, and apologize for getting the story wrong. If not that, at least they will demand to know why the promises the district and the state made to Camden's families were now being broken.

Won't they?

South Jersey political boss George Norcross also deserves credit for using his political weight to push these reforms in Camden. Just because he’s defending a corrupt tax incentives program doesn’t mean he’s not doing good elsewhere. He helped push through a new law that allowed nonprofit charter operators to run neighborhood schools, but also forced them serve every student who walks through the door.

Technically, that's true -- the problem is that not every student from the neighborhood -- who were all promised a seat -- is allowed to walk through the door of their local renaissance school.

Again, from the Auditor's report:

In the 2016–17 enrollment lottery, 461 students were accepted to renaissance schools. Of these students, 247 (54 percent) resided in the neighborhood of their renaissance school. In the 2017–18 enrollment lottery, 838 students were accepted to renaissance schools. Of these students, 387 (46 percent) resided in the neighborhood of their renaissance school. Overall, less than half of students accepted to renaissance schools (49 percent) through the enrollment lottery process for the 2016–17 and 2017–18 school years were from the renaissance school’s neighborhood.

All neighborhood students who submitted applications by the deadline for the 2016–17 lottery were accepted in their neighborhood renaissance school; however, 47 students who applied by the deadline for the 2017–18 lottery had to be placed on their neighborhood renaissance school’s wait list. As of October 2017, there were 195 students on the wait list for their neighborhood renaissance school.[emphasis mine]

The Auditor also explains why this matters:

The current policy could result in a higher concentration of students with actively involved parents or guardians being enrolled in renaissance schools. Their involvement is generally regarded as a key indicator of a student’s academic success, therefore differences in academic outcomes between district and renaissance students may not be a fair comparison.

This is a reality some of us have been trying to explain to outlets like the Star-Ledger for years. But for whatever reason, it appears the paper would rather use the weaseliest of words than admit they've been wrong all along. From last week's editorial:

This addressed a common knock on charters: that they self-select their students, by keeping out the poorest kids or those with special needs.

Those typical criticisms don’t apply in Camden. The so-called “renaissance schools” under charter management take the same, or more of the poorest and special ed kids as the district schools.

See how they've moved the goalposts? Before, every kid in the neighborhood got a seat; now, the kids are the same...

Except, as the Auditor points, out, it's likely they aren't. The very act of enrolling your child in a renaissance school is likely a marker that you are a more "actively involved parent." We know, thanks to a great deal of high-quality research (see the lit review here) that parents rely on their social networks to help them make decisions in school "choice" systems, and that different parents have different networks. It's not at all a stretch to think the students in renaissance schools differ from other students on characteristics that can't be shown in the data.

In other words: the renaissance schools may very well be creaming. Why is the State Auditor capable of getting this simple point, but the Star-Ledger editorial board isn't?

I'll talk more about these "unobserved" student differences and why they matter in my next post. For now, we need to understand this:

When the people of Camden were told that every child in a renaissance school's catchment would be enrolled, they were lied to. I'm using the passive voice deliberately here because who exactly did the lying -- and who simply transmitted this very big lie -- is open to debate.

But I would think that journalists -- whose primary function is to deliver the truth to their readers -- would, of all people, not want to perpetuate falsehoods when confronted with the facts. How sad that New Jersey's largest newspaper has such low standards, and such little regard for their readers.

We'll talk about the latest "study" on Camden schools' effectiveness next.

Star-Ledger Editorial Board

ADDING: Over the years, the Star-Ledger opinion section has been remarkably inept when it comes to writing about education:

They blamed teacher seniority when an award-winning teacher in Camden was fired -- except she never was.
They tried to show the failure of Camden's schools by pointing to the low proficiency rate at Camden Street School -- expect that school was in Newark, and hosted programs for that district's most cognitively impaired students.
They said a group of Newark teachers told "lies" about a contract negotiation -- except what those teachers actually said was, in fact, accurate.
They gave an anti-tenure superintendent space to tell stories about her staff -- except her own board said they weren't true (she was later terminated by that same board).
They misrepresented the views of union leaders -- even when those leaders were quite clear in their answers to direct questions.
They engaged in some particularly nasty language when describing the grassroots opposition to school leadership in Newark -- including making the accusation that local activists "don't seem to give a damn about the children."
They made fun of a union official's weight. Yes, they did.

Let me be clear about something: over the years, the Star-Ledger has had some excellent reporters on the education beat, including Jessica Calefati, Peggy McGlone, and Adam Clark. And, of course, the great Bob Braun worked there for years.

But the opinion section has been, and remains, a mess. If you're a public school teacher and you pay to read this dreck, you should really ask yourself: "Why?"

Monday, July 22, 2019

How Student "Creaming" Works

This latest series on Camden's schools is in three parts:

Part I (this post)

Part II

Part III

There is, as usual, so much wrong in this Star-Ledger editorial on Camden's schools that it will probably take several posts for me to correct all of its mistakes. But there's one assertion, right at the very top, that folks have been making recently about Newark's schools that needs to be corrected immediately:

Last year, for the first time ever, the low-income, mostly minority kids in Newark charter schools beat the state’s average scores in reading and math in grades 3-8 – incredible, given the far more affluent pool of kids they were competing against.

This is yet another example, like previous ones, of a talking point that is factually correct but utterly meaningless for evaluating the effectiveness of education policies like charter schooling. It betrays a fundamental misunderstanding of test scores and student characteristics, which keeps the people who make statements like this from having to answer the questions that really matter.

The question in this case is: Do "successful" urban charter schools get their higher test scores, at least in part, by "creaming" students?

Creaming has become a central issue in the whole debate about the effectiveness of charters. A school "creams" when it enrolls students who are more likely to get higher scores on tests due to their personal characteristics and/or their backgrounds. The fact that Newark's charter schools enroll, as a group, fewer students with special education needs -- particularly high-cost needs -- and many fewer students who are English language learners is an indication that creaming may be in play.

The quote above, however, doesn't address this possibility. The SL's editors argue instead that these schools' practices have caused the disadvantaged children in Newark's charters to "beat" the scores of children who aren't disadvantaged. And because the students in Newark's charters are "beating the state's average scores," they must be "incredible."

Last month, I wrote about some very important context specific to Newark that has to be addressed when making such a claim. But let's set that aside and get to a more fundamental question: given the concerns about creaming, is the SL's argument -- that charter students "beat" the state average -- a valid way to assess these schools' effectiveness?

No. It is not.

Let's go through this diagram one step at a time. The first point we have to acknowledge is that test scores, by design, yield a distribution of scores. That distribution is usually a "bell curve": a few students score high, a few score low, and most score in the middle.

This is the distribution of all test takers. But you could also pull out a subpopulation of students, based on any number of characteristics: race, gender, socio-economic status, and so on. Unless you delineate the subpopulation specifically on test scores, you're almost certainly going to get another distribution of scores.

Think of a school in a relatively affluent suburb, where none of the students qualify for free-lunch (the standard measure of socio-economic status in educational research). Think of all the students in that school. Their test scores will vary considerably -- even if the school scores high, on average, compared to less-affluent schools. Some of the kids will have a natural affinity for doing well on tests; some won't. Some will have parents who place a high value on scoring well on tests; some parents will place less value on scoring well. The students will have variations in their backgrounds and personal characteristics that we can't see in the crude variables collected in the data; consequently, their scores will vary.

The important point is that there will be a range of scores in this school. Intuitively, most people will understand this. But can they make the next leap? Can they understand that there will also be a range of scores in a lower-performing school?

There is, in my opinion, a tendency for pundits who opine on education to sometimes see children in disadvantaged communities as an undifferentiated mass. They seem not to understand that the variation in unmeasured student characteristics can be just a great in a school located in a disadvantaged community as it is in an affluent community; consequently, the test scores in less-affluent schools will also vary.

The children enrolled in Newark's schools will have backgrounds and personal characteristics that vary widely. Some will be more comfortable with tests than others. Some will have parents who value scoring well on tests more than others. It is certainly possible that the variation in a disadvantaged school -- the shape of the bell curve -- will differ from the variation in affluent schools, but there will be variation.

In my graph above (which is simply for illustrative purposes) I show that the scores of disadvantaged and not-disadvantaged students vary. On average, the disadvantaged students will score lower -- but their scores will still vary. And because the not-disadvantaged students' scores will also vary, it is very likely that there will be some overlap between the two groups. In other words: there will be some relatively high-scoring students who are disadvantaged who will "beat" some relatively low-scoring students who are not disadvantaged.

And here's where the opportunity for creaming arises. If a charter school can find a way to get the kids at the top of the disadvantaged students distribution to enroll -- while leaving the kids in the middle and the bottom of the distribution in the public district schools -- they will likely be able to "beat" the average of all test takers.

Is that what's happening in Newark? Again, the differences in the special education and English language learner rates suggest there is a meaningful difference in the characteristics of the student populations between charters and public district schools. But further opportunities for creaming come from separating students based on unmeasured characteristics.

For example: charter schools require that families apply for admission. It is reasonable to assume that there is a difference between a family that actively seeks to enroll their child in a charter, and a family that does not. Some of the "high-performing" charters in Newark have high suspension and attrition rates; this may send a signal to families that only a certain type of child is a good "fit" for a charter (some charter operators are quite honest about this). These schools also tend to have much longer school days and years; again, this may signal that only students who have the personal characteristics to spend the extra time in class should apply.

There is a very real possibility that these practices have led to creaming -- again, in a way that won't show up in the data. If the creaming is extensive enough -- and is coupled with test-prep instruction and curriculum, more resources, and a longer school day/year -- it wouldn't be too hard for a charter to "beat the state's average scores."

Is this a bad thing? That's an entirely different question. Given the very real segregation in New Jersey's schools, and the regressive slide away from adequate and equitable funding in the last decade, it's hard to find fault with Newark and Camden parents who want to get their children into a "better" school if they can. On the other hand, the fiscal pressures of chartering are real and can affect the entire system of schooling. Further, concentrating certain types of students into certain schools can have unexpected consequences.

A serious discussion of these issues is sorely needed in this state (and elsewhere). Unfortunately, because they refuse to acknowledge some simple realities, the Star-Ledger's editorial board once again fails to live up to that task. I'll get to some other mistakes they make in this piece in a bit.

Star-Ledger Editorial Board

Monday, July 8, 2019

Who Put the "Stakes" In "High-Stakes Testing"?

Peter Green has a smart piece (as usual) about Elizabeth Warren's position on accountability testing. Nancy Flanagan had some smart things to say about it (as usual) on Twitter. Peter's piece and the back-and-forth on social media have got me thinking about testing again -- and when that happens these days, I find myself running back to the testing bible: Standards for Educational and Psychological Testing:

"Evidence of validity, reliability, and fairness for each purpose for which a test is used in a program evaluation, policy study, or accountability system should be collected and made available." (Standard 13.4, p. 210, emphasis mine)

This statement is well worth unpacking, because it dwells right in the heart of the ongoing debate about "high-stakes testing" and, therefore, influences even the current presidential race.

A core principle of psychometrics is that the evaluation of tests can't be separated from the evaluation how their outcomes will be used. As Samuel Messick, one of the key figures in the field, put it:

"Hence, what is to be validated is not the test or observation device as such but the inferences derived from test scores or other indicators -- inferences about score meaning or interpretation and about the implications for action that the interpretation entails." [1] (emphasis mine)

He continues:

"Validity always refers to the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores." [1] (emphasis mine)

I'm highlighting "actions" here because my point is this: You can't fully judge a test without considering what will be done with the results.

To be clear: I'm not saying items on tests, test forms, grading rubrics, scaling procedures, and other aspects of test construction can't and don't vary in quality. Some test questions are bad; test scoring procedures are often highly questionable. But assessing these things is just the start: how we're going to use the results has to be part of the evaluation.

Michael Kane calls on test makers and test users to make an argument to support their proposed uses of test results:

"To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the test scores, and therefore, validation requires a clear statement of the claims inherent in the proposed interpretations and uses of the test scores. Public claims require public justification.

"The argument-based approach to validation (Cronbach, 1988; House, 1980; Kane, 1992, 2006; Shepard, 1993) provides a framework for the evaluation of the claims based on the test scores. The core idea is to state the proposed interpretation and use explicitly, and in some detail, and then to evaluate the plausibility of these proposals." [2] (emphasis mine)

As I've stated here before: standardized tests, by design, yield a normal or "bell-curve" distribution of scores. Test designers prize variability in scores: they don't want most test takers at the high or low end of the score distribution, because that tells us little about the relative position of those takers. So items are selected, forms are constructed, and scores are scaled such that a few test takers score low, a few score high, and most score in the middle. In a sense, the results are determined first -- then the test is made.

The arguments some folks make about how certain tests are "better" than others often fail to acknowledge this reality. Here in New Jersey, a lot of hoopla surrounded the move from the NJASK to the PARCC; and then later, the change from the PARCC to the NJSLA. But the results of these tests really don't change much.

If you scored high on the old test, you scored high on the new one. So the issue isn't the test itself, because different tests are yielding the same outcomes. What really matters is what you do with these results after you get them. The central issue with "high-stakes testing" isn't the "testing"; it's the "high-stakes."

So how are we using test scores these days? And how good are the validity arguments for each use?

- Determining an individual student's proficiency. I know I've posted this graphic on the blog dozens of times before, but people seem to respond to it, so...

"Proficiency" is not by any means an objective standard; those in power can set the bar for it pretty much wherever they want. Education officials who operate in good faith will try to bring some reason and order to the process, but it will always be, at its core, subjective.

In the last few years, policymakers decided that schools needed "higher standards"; otherwise, we'd be plagued by "white suburban moms" who were lying to themselves. This stance betrayed a fundamental misunderstanding of what tests are and how they are constructed. Again, test makers like variation in outcomes, which means someone has got to be at the bottom of the distribution. That isn't the same as not being "proficient," because the definition of "proficiency" is fluid. If it isn't, why can policymakers change it on a whim?

I'll admit I've not dug in on this as hard as I could, but I haven't seen a lot of evidence that telling a kid and her family that she is not proficient -- especially after previous tests said she was -- does much to help that kid improve her math or reading skills by itself. If the test spurs some sort of intervention that can yield positive results, that's good.

But those of us who work with younger kids know that giving feedback to a child about their abilities is tricky business. A test that labels a student as "not proficient" may have unintended negative consequences for that student. A good validity argument for using tests this way should include an exploration of how students themselves will benefit from knowing whether they clear some arbitrary proficiency cut score. Unfortunately, many of the arguments I hear endorsing this use of tests are watery at best.

Still, as far as stakes go, this one isn't nearly as high as...

- Making student promotion or graduation decisions. When a test score determines a student's progression through or exit from the K-12 system, the stakes are much higher; consequently, the validity argument has to be a lot stronger. Whether grade retention based on test scores "works" is constantly debated; we'll save discussion for another time (I do find this evidence to be interesting).

It's the graduation test that I'm more concerned with, especially as I'm from New Jersey and the issue has been a key education policy debate over the past year. Proponents of graduation testing never want to come right out and say this in unambiguous terms, but what they're proposing is withholding a high school diploma -- a critical credential for entry into the workforce -- from high school students who did all their work and passed their courses, yet can't pass a test.

I don't see any validity argument that could possibly justify this action. Again, the tests are set up so someone has to be at the bottom of the distribution; is it fair to deny someone a diploma based on a test that must have low scoring test takers? And no one has put forward a convincing argument that not showing proficiency in the Algebra I exam is somehow a justification for withholding a diploma. A decision this consequential should never be made based on a single test score.

- Employment consequences for teachers. Even if you can make a convincing argument that standardized tests are valid and reliable measures of student achievement, you haven't made the argument that they're measures of teacher effectiveness. A teacher's contribution to a student's test score only explains a small part of the variation in those scores. Teasing out that contribution is a process rife with the potential for error and bias.

If you want to use value-added models or student growth percentiles as signals to alert administrators to check on particular teachers... well, that's one thing. Mandating employment consequences is another. I've yet to see a convincing argument that firing staff solely or largely on the basis of error-prone measures will help much in improving school effectiveness.

- Closing and/or reconstituting schools. It's safe to say that the research on the effects of school closure is, at best, mixed. That said, it's undeniable that students and communities can suffer real damage when their school is closed. Given the potential for harm, the criteria for targeting a school for closure should be based on highly reliable and valid evidence.

Test scores are inevitably part of the evidence -- in fact, many times they're most of the evidence -- deployed in these decisions... and yet their validity as measures of a student's probability of seeing their educational environment improve if a school is closed is almost never questioned by those policymakers who think school closure is a good idea.

Closing a school or converting ii to a charter is a radical step. It should only be attempted if it's clear there is no other option. There's just no way test outcomes, by themselves, give enough information to make that decision. It may well be a school that is "failing" by one measure is actually educating students who started out well behind their peers in other schools. It may be the school is providing valuable supports that can't be measured by standardized tests.

- Evaluating policy interventions. Using test scores to determine the efficacy of particular policy interventions is the bread-and-butter of labor economists and other quant researchers who work in the education field. I rarely see, however, well-considered, fully-formed arguments for the use of test outcomes in this research. More often, there is a simple assumption that the test score is measuring something that can be affected by the intervention; therefore, its use must be valid.

In other words: the argument for using test scores in research is often that they are measuring something: there is signal amid the noise. I don't dispute that, but I also know that the signal is not necessarily indicative of what we really want to measure. Test scores are full of construct-irrelevant variance: they vary because of factors that are other than the ones test-makers are trying to assess. Put another way: a kid may score higher than another not because she is a better reader or mathematician after a particularly intervention, but because she is now a better test-taker.

This is particularly relevant when the effect sizes measured in research are relatively small. We see this all the time, for example, in charter school research: effect sizes of 0.2 or less are commonly referred to as "large" and "meaningful." But when you teach to the test -- often to the exclusion of other parts of the curriculum -- it's not that hard to pump up your test scores a bit relative to those who don't. Daniel Koretz has written extensively on this.

These are only five proposed uses for test scores; there are others. But the initial reason for instituting a high-stakes standardized testing regime was "accountability." Documents from the early days of No Child Left Behind make clear that schools and schools districts were the entities being held accountable. Arguably, so were states -- but primarily to monitor schools and districts.

I don't think anyone seriously thinks schools and districts -- and the staffs within them -- shouldn't be held accountable for their work. Certainly, taxpayers deserve to know whether their money is being used efficiently and effectively, and parents deserve to know whether their children's schools are doing their job. The question that we seem to have skipped over, however, is whether using standardized tests to dictate actions with high-stakes is a valid use of those tests' outcomes.

Yes, there are people who would do away with standardized testing altogether. But most folks don't seem to have a problem with some level of standardized testing, nor with using test scores as part of an accountability system (although many, like me, would question why it's only the schools and districts that are held accountable, and not the legislators and executives at the state and federal level who consistently fail to provide schools the resources they need for success).

What they also understand, however -- and on this, the public seems to be ahead of many policymakers and researchers -- is that these are limited measures of school effectiveness, and that we are using them in ways that introduce corrupting pressures, which makes schools worse. That, more than any problem with the tests themselves, seems to be driving the backlashing against high-stakes testing.

As Kane says: "Public claims require public justification." The burden of proof, then, is on those who would use tests to take all sorts of highly consequential actions. Their arguments need to be made clearly and publicly, and they have an obligation to not only demonstrate that the tests themselves are good measures of student learning; they also have to argue convincingly that the results should be used for each separate purpose for which policymakers would use them.

I would suggest to those who question the growing skepticism of high-stakes testing: go back and look at your arguments for their use in undertaking specific actions. Are those arguments as strong as they should be? If they aren't, perhaps you should reconsider the stakes, and not just the test.

[1] Messick, S. (1989). "Validity." In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–100). Washington, DC: American Council on Education.

[2] Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement 50(1), 1–73.