I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Friday, February 13, 2015

What We DON'T Know About the PARCC

One of the more annoying aspects of the current debate about PARCC -- the new statewide, standardized tests coming to New Jersey and a dozen other states -- is how the test's advocates project such certainty in their claim that the PARCC is a superior test.

They will tell you the PARCC will help ensure that students are "college-ready." They will tell you the PARCC will "provide parents with important information." They will tell you the PARCC is "generations better" than previous standardized tests.

People are certainly entitled to their opinions, but let's be clear: at this point, there is very little evidence to back up any assertions of the PARCC's superiority. In truth, there is a great deal we don't know about the PARCC:

We don't know if the PARCC is more reliable or valid than the NJASK, or any other statewide standardized test.

Those who claim that the PARCC sample items that have been released are "better" than the questions on the old NJASK have the rest of us at a disadvantage: we never got to see the NJASK. In fact, any claims of the PARCC's superiority over the old tests fail if only because the NJASK was never properly studied; we don't really know how "good" or "bad" the NJASK actually was.

There are two major considerations for any test: validity and reliability. Validity speaks to whether the test measures what you want to measure; reliability deals with the consistency of a test's results. I've been looking, and, so far, I've found no evidence the PARCC is more reliable than any other standardized statewide test.

And we have very little information as to the external validity of the PARCC, if only because it is so new. We don't know if better results on the PARCC correlate more tightly to better outcomes in college or career. How could we? We haven't even administered the test yet!

When anyone asserts that the PARCC is somehow "better" than another test, they are offering an opinion based on personal preference. That's perfectly fine (and it's worth noting that some people's preferences are better-informed than others). But claims about PARCC's superiority over what came before it are not currently backed up by objective evidence, and PARCC's cheerleaders ought to be far more circumspect in making their claims.

We don't know if the PARCC has better predictive validity for "college and career readiness" than other standardized tests.

I'm going to make a bet right now: $50 (hey, I make a teacher's salary...) says scale scores on the PARCC and scale scores on the NJASK for individual students are highly correlated. Of course, no one with access to this data is going to take me up on this bet, because they know that a student who scores well on one standardized test will almost certainly score well on a different one.

The primary task of standardized tests is to rank and order students. If you doubt me, look at how the NJDOE is going to report the results: it's all based on how students do compared to other students.

The notion of "college and career readiness" (which I think is utterly phony anyway) isn't supposed to be tied to the ranking of students. It's supposed to be about whether students have acquired the knowledge needed to be successful adults. But ranking students is what the PARCC is designed to do; setting the proficiency levels comes later (see below).

I can guarantee you that the ranking and ordering of New Jersey's students on the PARCC will barely differ from their ranking on the NJASK.* If that's the case, what could possibly make the PARCC any "better"?

We don't know the extent the "rigor" which the PARCC is allegedly measuring is developmentally appropriate.

As parents and other stakeholders take a closer look at the sample items that have been released, they grow increasingly concerned that the PARCC is not developmentally appropriate. Russ Walsh has produced evidence that some PARCC sample tests overreach in the difficulty level of their reading passages.

There's no point in setting high standards for students if they can't reasonably achieve them. And I haven't seen any evidence that the standards the PARCC demands can be achieved by the large majority of our students.

What I do know is that tests like the PARCC must have items with various degrees of difficulty in order to create a normalized or "bell-curve" distribution of scores. This summer, a committee consisting of lord-knows-who will "benchmark" the test and set the performance levels -- after the test has been administered.

New York went through this process last year, leading to the crashing of proficiency levels and the wailing and gnashing of teeth by reformies like Governor Andrew Cuomo. He promptly decided to dump all the blame on teachers and ignore his own failure to provide adequate funding for New York's schools. This, of course, appears to be the real purpose of standardized tests: ammunition for politicians to get what they want.

It's perfectly fine to benchmark an exam after the fact. But doing so highlights the normative nature of setting proficiency levels. The criteria for setting "cut scores" -- the levels needed to show various levels of proficiency -- isn't based on some objective idea of learning; it's based on how all of the students did on the test. Which is why the cut scores are going to be set after the PARCC is administered, when the benchmarkers can see the results for each test item and determine how difficult it was.

I know this is knotty stuff: lord knows I've struggled with writing about it before. But the critical point is this: given the variation in the abilities of our students and the amount of resources we are willing to devote to public schools, it is reasonable to question whether all students can achieve the levels of "rigor" the PARCC is calling for.

That isn't a statement excusing low expectations for children: it's a statement informed by the knowledge that this test is going to rank and order students. And, logically, not everyone can be above average.

We don't know how large the bias resulting from the computerized format of the PARCC will be.

I was just at a "Take the PARCC" event last night (more on that later). Even the parents and teachers I spoke with who didn't have a problem with the content of the PARCC admitted that children who are more computer literate are going to have an advantage on this exam.

I'll leave it to others to point out the design flaws in the user interface of the PARCC (scroll windows within scroll windows?). And I'll certainly acknowledge paper tests can and do have design flaws.

But there's no question in my mind that a child with regular access to a modern computer with high-speed internet access at home will feel far more comfortable in the PARCC testing environment than a child without that access. At the very least, we ought to study the extent of this bias before we make high-stakes decisions based on PARCC results.

We don't know if the PARCC is sensitive to changes in instruction.

I know regular readers have seen this a billion times, but once again...

It is impossible to deny the correlation between socio-economic status and standardized test scores. And yet we're using these scores to make high-stakes decisions about schools and teachers and even students (yes, we are) without appropriately acknowledging this bias.

Worse: even the PARCC people admit they don't know how this test does at measuring the quality and alignment of instruction. We are attributing all sorts of causes for the variations in PARCC test scores without even knowing the extent of the relationship between school and teacher effectiveness and those scores.

Do I even have to point out how insane this is?

Look, I'm not going to defend the NJASK or any other pre-PARCC test. As I've said many times: we barely knew anything about that test, or many of the other statewide tests that were administered in the wake of No Child Left Behind. 

I'll also risk alienating some of you by stating, once again, that I believe there is an appropriate and reasonable use for standardized testing, especially in the formulating of policy. I think tests results can help inform decisions, even if using them to compel decisions is totally unwarranted and, frankly, ignorant.

Lord knows my job as a researcher and blogging smart-ass would be far more difficult if I didn't have test scores to work with. Much of my work in advocating for teacher workplace rights and fair/adequate school funding and reasonable charter school policies relies on standardized test results.

But when I and others use this data, we use it appropriately, with full acknowledgment of its limitations and flaws. And we certainly don't make unsubstantiated claims about how the tests themselves are going to radically improve instruction and outcomes for students.

It's time for the PARCC cheerleaders to take a step back and think more clearly about their claims. It's time for them to start showing a little more humility and a little more healthy skepticism. It's time for them to stop holding on to arguments that have little evidence to back them up.

We know way less about the PARCC than many would have us believe. We have very little evidence that it is "better" than what came before. Let's at least wait until we've studied it before we claim otherwise.

A lack of external validity.

* One caveat: we might see the ceiling go up a bit, especially in math. More on this later.


Anonymous said...

We know exactly three things:

1. It will be longer

2. It will be "harder" in the sense that more kids will get more questions "wrong"

3. The result of 2 will be politicians and opportunists demanding more and more reformy goodness

We SORT OF know that 1 & 2 were a set up for 3, but I can't find it in a memo.

Dienne said...

"I think tests results can help inform decisions...."

Such as? Here's what we've learned from 100 years of standardized testing: test scores predict the size of the houses the students who take them live in. We really haven't learned much beyond that. Why do we need to keep subjecting kids to tests (and losing instructional time to do so) when we already know that?

And, sorry, but the ease of your blogging life isn't worth children suffering either. Yes, because of test scores, we know that charter schools don't succeed even by their own metric of test scores. But what if we didn't even have test scores to "measure" with? It would have long ago been obvious that charters don't have any "secret sauce" if it weren't for test scores that they can game.

Dienne said...

BTW, you're the one who said, "The primary task of standardized tests is to rank and order students." Any defense of standardized testing must then justify why we need to rank and order students.

Anonymous said...

One needs to be open minded when moving forward. Yes, it's very different. Yes, scores will drop all over the state. No, it's not Earth shattering. I'd love to start a dialogue with you on this if you wish... @iSuperEit

Giuseppe said...

The problem is massive over testing and especially the tsunami of high stakes standardized testing which can lead to school closings and the mass firing of teachers. What a wonderfully exquisite way of demoralizing and punishing teachers. Huge amounts of time go to test prep, practice tests and developing test specific skills instead of actual classroom instruction. This is all in addition to the normal amount of testing that goes on in a classroom to evaluate the kids' progress and to generate grades for the report cards. When is enough enough?

Julie Mikula said...

And eventually if all goes as planned, teachers will be evaluated on student performance on this test, which may have items not even in the curriculum, not to mention unfairness of student iIQs. I teach gifted students and basic skills students, so which scores will I be held responsible for? This makes no sense.