Jersey Jazzman: The PARCC, Phil Murphy, and Some Common Sense

Monday, July 16, 2018

The PARCC, Phil Murphy, and Some Common Sense

Miss me?

I'll tell you what I've been up to soon, I promise. I'm actually still in the middle of it... but I've been reading and hearing a lot of stuff about education policy lately, and I've decided I can't just sit back -- even if my time is really at a premium these days -- and let some of it pass.

For example:

Gov. Phil Murphy just announced that he will start phasing out the PARCC test, our state's most powerful diagnostic tool for student achievement.

Like an MRI scan, it can detect hidden problems, pinpointing a child's weaknesses, and identifying where a particular teacher's strategy isn't working. This made it both invaluable, and a political lighting rod.

That's from our old friends at the Star-Ledger op-ed page. And, of course, the NY Post never misses a chance to take down both a Democrat and the teachers unions:

New Jersey Gov. Phil Murphy is already making good on his promises to the teachers unions. Too bad it’s at the kids’ expense.

[...]

Officially, he wants the state to transition to a new testing system — one that’s less “high stakes and high stress.” It’s a safe bet that the future won’t hold anything like the PARCC exams, which are written by a multi-state consortium. Instead, they’ll be Jersey-only tests — far easier to water down into meaninglessness.

The sickest thing about this: A couple of years down the line, Murphy will be boasting about improved high-school graduation rates — without mentioning the fact that his “reforms” have made many of those diplomas worthless.

First of all -- and as I have pointed out in great detail -- it's the Chris Christie-appointed former superintendents of Camden and Newark, two districts under state control, who have done the most bragging about improved graduation rates. These "improvements" have taken place under PARCC; however, it's likely they are being driven by things like credit recovery programs, which have nothing to do with high school testing.

The Post wants us to believe that the worth of a high school diploma is somehow enhanced by implementing high school testing above and beyond what is required by federal law. But there's no evidence that's true.

In 2016-17, only 12 states required students to pass a test to graduate; the only other state requiring passing the PARCC is New Mexico. Further, as Stan Karp at ELC has pointed out, the PARCC passing rate on the Grade 10 English Language Test in 2017 was 46%; the passing rate on the Algebra I exam was 42%. That's three years after the test was first introduced into New Jersey.

Does the Post really want to withhold diplomas from more than half of New Jersey's students?

The PARCC was never designed to be a graduation exit exam. The proficiency rates -- which I'll talk about more below -- were explicitly set up to measure college readiness. It's no surprise that around 40 percent of students cleared the proficiency bar for the PARCC, and around 40 percent of adults in New Jersey have a bachelors degree.

I don't know when we decided everyone should go to a four-year college. If we really believe that, we'll have a lot of over-educated people doing necessary work, and we'll have to more than double the number of college seats available. Anyone think that's a good idea? NY Post, should New Jersey jack up taxes by an insane amount to open up its state colleges to more than twice as many students as they have now?

Let's move on to the S-L's editorial. The idea that the PARCC is somehow the "most powerful diagnostic tool" for identifying an individual child's weaknesses, and therefore the flaws in an individual teacher's practice, is simply wrong. The most obvious reason why the PARCC is not used for diagnosing individual students' learning progress is that by the time the school gets the score back, the student has already moved on to the next grade and another teacher.

There are, in fact, many other assessment tools available to teachers -- including plenty of tests that are not designed by the student's teacher -- that can give actionable feedback on a student's learning progress. This is the day-to-day business of teaching, taught to those of us in the field at the very beginning of our training: set objectives, instruct, assess, adjust objectives and/or instruction, assess, etc.

The PARCC, like any statewide test, might have some information useful to school staff as a child moves from grade-to-grade. But the notion that it is "invaluable" for its MRI-like qualities is just not accurate. How do I know?

Because the very officials at NJDOE during the Christie administration who pushed the PARCC so hard admitted it was not designed to inform instruction:

ERLICHSON: In terms of testing the full breadth and depth of the standards in every grade level, yes, these are going to be tests that in fact are reliable and valid at multiple cluster scores, which is not true today in our NJASK. But there’s absolutely a… the word "diagnostic" here is also very important. As Jean sort of spoke to earlier: these are not intended to be the kind of through-course — what we’re talking about here, the PARCC end-of-year/end-of-course assessments — are not intended to be sort of the through-course diagnostic form of assessments, the benchmark assessments, that most of us are used to, that would diagnose and be able to inform instruction in the middle of the year.

These are in fact summative test scores that have a different purpose than the one that we’re talking about here in terms of diagnosis.

That purpose is accountability. That's something I, and every other professional educator I know, is all for -- provided the tests are used correctly.

As I've written before, I am generally agnostic about the PARCC. From what I saw, the NJASK didn't seem to be a particularly great test... but I'll be the first to admit I am not a test designer, nor a content specialist in math or English language arts.

The sample questions I've seen from the PARCC look to me to be subject to something called construct-irrelevant variance, a fancy way of saying test scores can vary based on stuff you're trying not to measure. If a kid can't answer a math question because the question uses vocabulary the kid doesn't know, that question isn't a good assessor of the kid's mathematical ability; the scores on that item are going to vary based on something other than the things we really want to measure.

As I said, I'm not the best authority on the alleged merits of the PARCC over the NJASK (ask folks like this guy instead, who really knows what he's talking about when it comes to teaching kids how to read). I only wish the writers at the Star-Ledger had a similar understanding of their own limitations:

If this were truly for the sake of over-tested students, we wouldn't be starting with the PARCC. Unlike its predecessors, this test can tell educators exactly where kids struggle and how to better tailor their lessons. It's crucial for helping to close the achievement gap between black and white students; not just between cities and suburbs, but within racially mixed districts.

Again: the PARCC is a lousy tool for informing instruction, because that's not its job. The PARCC is an accountability measure -- and as such, there is very little reason to believe it is markedly better at identifying schools or teachers in need of remediation than any other standardized test.

Think about it this way: if the PARCC was really that much better than the NJASK, we'd expect the two tests to yield very different results. A school that was "lying" to its parents about its scores on the NJASK would instead show how it was struggling on the PARCC. There would be little correlation between the two tests if one was so much better than the other, right?

Guess what?

These are the Grade 7 English Language Arts (ELA) test scores on the 2014 NJASK and 2015 PARCC, the year it was first used in New Jersey. Each dot is a school around the state. Look at the strong relationship: if a school has a low score on the NJASK in 2014, it had a low score on the PARCC in 2015. Similarly, if it was high in 2014 on the NJASK, it was high on the 2015 PARCC. 80 percent of the variation on the PARCC can be explained by last year's score on the NJASK; that is a very strong relationship.

I'll put some more of these below, but let me point out one more thing: the students who took the Grade 7 NJASK in 2014 were not the same students who took the Grade 7 PARCC in 2015, because most students moved up a grade. How did the test scores of the same cohort compare when they moved from Grade 7, when they took the NJASK, to Grade 8, when they took the PARCC?

Still an extremely strong relationship.

No one who knows anything about testing is going to be surprised by this. Standardized tests, by design, yield normal, bell-curve distributions of scores: a few kids score low, a few score high, and most score in the middle. There's just no evidence to think the NJASK was "lying" back then any more than the PARCC "lies" now.

And let me anticipate the argument about "proficiency":

Again, I've been over this more than a few times: "proficiency" rates are largely arbitrary. When you have a normal distribution of scores, you can set the rate pretty much wherever you want, depending on how you define "proficient." I know that makes some of you crazy, but it's true: there is no absolute definition of "proficient," any more than there's an absolute definition of "smart."

So, no, the NJASK wasn't "lying" about NJ students' proficiency; the state could have used the same distribution of scores from the older test* and set a different proficiency level. And no, the PARCC is not in any way important as a diagnostic tool, nor is there any evidence it is a much "better" test than the old NJASK.

Look, I know this bothers some of you, but I am for accountability testing. The S-L is correct in noting that these tests have played an important role in pointing out inequities within the education system. I am part of a team that works on these issues, and we've relied on standardized tests to show that there are serious problems with our nation's current school funding system.

But if that's the true purpose of these tests -- and it's clear that it is -- then we don't need to spend as much time or money on testing as we do now. If we choose to use test outcomes appropriately, we can cut back on testing and remove some of the corrupting pressures they can impose on the system.

ADDING: This is not the first time I've written about the PARCC fetishism.

ADDING MORE: Does it strike any of you as odd that both the NY Post and the Star-Ledger came out with similar editorials beating up Governor Murphy and the teachers unions over his new PARCC policy -- on the very same day?

As I've documented here: when it comes to education (and many other topics), editorial writers often rely on the professional "reformers" in their Rolodexes to feed them ideas. If there is a structural advantage these "reformers" have over folks like me, it's that they get paid to make the time to influence op-ed writers and other policy influencers. They are subsidized, usually by very wealthy interests, to cultivate relationships with the media, which in turn bends the media toward their point of view.

One would hope editorial boards could see this past this state of affairs. Alas...

ADDING MORE: From the NJDOE website:

A GUIDE TO PARENT/TEACHER CONVERSATIONS ABOUT THE PARCC SCORE REPORTS

ABOUT INDIVIDUAL STUDENT SCORES

a) What if my child is doing well in the classroom and on his or her report card, but it is not reflected in the test score?

PARCC is only one of several measures that illustrate a child’s progress in math and ELA. Report card grades can include multiple sources of information like participation, work habits, group projects, homework, etc., that are not reflected in the PARCC score, so there may be a discrepancy.

Report cards can also reflect outcomes on tests made by teachers, districts, or other vendors, administered multiple times. The PARCC, like any test, is subject to noise and bias. It is quite possible a report card grade is the better measure of an individual student's learning than a PARCC score.

If there is a disconnect between the PARCC and a report card, OK, parents and teachers and administrators should look into that. But I take the above statement from NJDOE as an acknowledgment that the PARCC, or any other test, is a sample of learning at a particular time, and it's outcomes are subject to error and bias like any other assessment.

Again: by all means, let's have accountability testing. But PARCC fetishism in the service of teachers union bashing is totally unwarranted. Stop the madness.

SCATTERPLOT FUN! Here are some other correlations between NJASK and PARCC scores at the school level. You'll see the same pattern in all grades and both exams (ELA and math) with the exception of Grade 8 math. Why? Because the PARCC introduced the Algebra 1 exam; Grade 8 students who take algebra take that exam, while those who don't take algebra take the Grade 8 Math exam.

The Algebra 1 results are some of the most interesting ones available, for a whole variety of reasons. I'll get into that in a bit...

* OK, I need to make this clear: there was an issue with the NJASK having a bit of a ceiling effect. I've always found it kind of funny when people got overly worried about this: like the worst thing for the state was that so many kids were finding the old test so easy, too many were getting perfect scores!

Whether the PARCC broke through the ceiling with construct-relevant variance is an open question. My guess is a lot of the "higher-level" items are really measuring something aside from mathematical ability. In any case, the NJASK wasn't "lying" just because more kids aced it than the PARCC.

4 comments:

edblisa said...: Just an FYI for you....MD uses PARCC ELA 10 and Alg I as grad requirements, also. We are hearing "rumors" that they are getting rid of PARCC for the 2019 school year and the school board and Governor are looking at computer adaptive tests to be developed by New Meridian (or Meridian). Guess what? Meridian owns or manages PARCC. This association between New Meridian and PARCC is very incestuous. This will just be a rebranding same as with the rebranding of the Common Core standards. It's all just a lie and just another way for ed-tech and "rephormers" to extract more education tax dollars.; July 16, 2018 at 8:14:00 AM PDT
CrunchyMama said...: Noooo!!! How had this slipped under my radar? (Probably having an 8th-grader now LOL)

Thanks for the heads-up!; July 17, 2018 at 7:21:00 AM PDT
Duane Swacker said...: "the scores on that item are going to vary based on something other than the things we really want to measure."

Nothing is being "measured" in the standardized testing process. Why? From Ch.6 of "Infidelity to Truth: Education Malpractice in Public Education":

In addition to that and perhaps even worse is that the proponents of these standards claim that the CCSS are standards against which 'student achievement' can be measured. In doing so educational standards proponents claim the documentary standard (definition three) as a metrological standard (definition four). In doing so they are falsely claiming a meaning of standard that should not be given credence.
This confusion is compounded by what it means to measure something and the similar misuse of the meaning of the word measure by the proponents of the standards and testing regime. Assessment and evaluation perhaps can be used interchangeably but assessment and evaluation are not the same as measurement. Word usage matters!
The Merriam-Webster dictionary definition of measure includes the following:
1a (1): an adequate or due portion (2): a moderate degree; also: moderation, temperance (3): A fixed or suitable limit: bounds b: the dimensions, capacity or amount of something ascertained by measuring c: an estimate of whit is to be expected (as of a person or situation d: (1): a measured quantity (2): amount, degree
2a: an instrument or utensil for measuring b (1): a standard or unit of measurement—see weight table (2): A system of standard units of measure
3: the act or process of measuring
4a (1): melody, tune (2): dance; especially: a slow and stately dance b: rhythmic structure or movement: cadence: as (1): poetic rhythm measured by temporal quantity or accent; specifically: meter (2): musical time c (1): a grouping of a specified number of musical beats located between two consecutive vertical lines on a staff (2): a metrical unit: foot
5: an exact divisor of a number
6: a basis or standard of comparison <wealth is not a measure of happiness
7: a step planned or taken as a means to an end; specifically: a proposed legislative act
Measure as commonly used in educational standard and measurement discourse comes under definitions 1d, 2, and 3, the rest not being pertinent other than to be used as an obfuscating meaning to cover for the fact that, indeed, there is no true measuring against a standard whatsoever in the educational standards and standardized testing regimes and even in the grading of students. What we are left with in this bastardization of the English language is a bewildering befuddle of confusion that can only serve to deceive many into buying into intellectually bankrupt schemes that invalidly sort, rate and rank students resulting in blatant discrimination with some students rewarded and others punished by various means such as denying opportunities to advance, to not being able to take courses or enroll in desired programs of study.; July 17, 2018 at 9:09:00 AM PDT
Duane Swacker said...: Continuing from above:

The most misleading concept/term in education is "measuring student achievement" or "measuring student learning". The concept has been misleading educators into deluding themselves that the teaching and learning process can be analyzed/assessed using "scientific" methods which are actually pseudo-scientific at best and at worst a complete bastardization of rationo-logical thinking and language usage.
There never has been and never will be any "measuring" of the teaching and learning process and what each individual student learns in their schooling. There is and always has been assessing, evaluating, judging of what students learn but never a true "measuring" of it.
The TESTS MEASURE NOTHING, quite literally when you realize what is actually happening with them. Richard Phelps, a staunch standardized test proponent (he has written at least two books defending the standardized testing malpractices) in the introduction to “Correcting Fallacies About Educational and Psychological Testing” unwittingly lets the cat out of the bag with this statement:
“Physical tests, such as those conducted by engineers, can be standardized, of course, but in this volume, we focus on the measurement of latent (i.e., nonobservable) mental, and not physical, traits.”
Notice how he is trying to assert by proximity that educational standardized testing and the testing done by engineers are basically the same, in other words a “truly scientific endeavor”. The same by proximity is not a good rhetorical/debating technique.
Since there is no agreement on a standard unit of learning, there is no exemplar of that standard unit and there is no measuring device calibrated against said non-existent standard unit, how is it possible to “measure the nonobservable”?
PURE LOGICAL INSANITY! (pgs 6-9); July 17, 2018 at 9:09:00 AM PDT