I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Friday, September 25, 2015

Common Core Testing: Who's The Real "Liar"?

By way of Peter Greene, it looks like our SecEd is back on his crusade against all that awful "lying" going on in our schools:
U.S. Secretary of Education Arne Duncan said Pennsylvania won’t be alone in seeing lower standardized test scores after aligning its tests with core standards, and that the lower scores don’t mean students aren’t smart.
Test scores across the country are expected to be lower as they are released in the coming weeks because standards have been raised on the tests that are based on the Common Core. In Pennsylvania, the tests are aligned with Pennsylvania Core.
“Obviously, students aren’t going to be less smart than they were six months ago or a year ago,” Mr. Duncan said. “In far too many states, including Pennsylvania, politicians dummied down standards to make themselves look good.”
Mr. Duncan made his comments during a visit to Carnegie Mellon University, which was the final stop on his seven state, 10-stop tour titled “Ready for Success.”
The secretary said children and parents “were lied to and told they were on the track to be successful” when they weren’t. He called that “one of the most insidious things that happened in education.” [emphasis mine]
Arne's not alone: Michael Petrilli, unofficially America's Most Reasonable Reformer™, is also glad we've final ditched the "lying":
Between 2010 and 2012, more than 40 states adopted the Common Core standards in reading and math, setting dramatically higher expectations for students in our elementary and secondary schools. Now comes a critical milestone in this effort. In the coming weeks, parents in most states will receive for the first time their children’s scores on new tests aligned to the standards. The news is expected to be sobering, and may come as a shock for many. Parents shouldn’t shoot the messenger. 
It is important to remember why so many states started down this path in the first place. Under federal law, every state must test children every year in grades 3-8 to ensure they are making progress. That’s a good idea. Parents deserve to know if their kids are learning and taxpayers are entitled to know if the money we spend on schools is being used wisely. 
But it is left to states to define what it means to be “proficient” in math and reading. Unfortunately, most states in the past set a very low bar. They “juked the stats.” The result was a comforting illusion that most of our children were on track to succeed in college, carve out satisfying careers and stand on their own two feet. 
To put it plainly, it was a lie. Most states set absurdly low academic standards before the Common Core, and their tests were even worse. In some cases children could even randomly guess the answers and be all but guaranteed to pass. Imagine being told year after year that you’re doing just fine, only to find out when you apply for college or a job, that you’re simply not as prepared as you need to be. [emphasis mine]
Uh-huh. Listen, fellas, I don't want to deprive you of your obviously enjoyable moments of righteous indignation...

But "lying"? Please. Let's go to New York and try to show you guys what you're just not understanding.

You see, The Empire State has a rather unique history when it comes to testing and "lying." Over the past several years, New York has monkeyed around with its proficiency rates more than once. And each time, pundits or politicians decried the "lying" that came before the change; in other words, the new, lower proficiency rates that the tests produced were more "honest" than the ones that came before.

Here, for example, are the proficiency rates in Grade 8 English Language Arts (ELA) from 2007 to 2014.*

Up until 2009, Mayor Mike Bloomberg and Schools Chancellor Joel Klein had nearly thrown out their shoulders patting themselves on the back about the rise in proficiency rates in New York City. But in 2010, New York State reset its passing rates; suddenly, Bloomberg and Klein weren't so smug.

Later in 2013, New York introduced a new group of tests based on the Common Core State Standards; once again, proficiency rates (defined in New York as meeting Level 3 or Level 4 on the tests) took a nosedive.

But here's the thing: were changes in the tests causing those slips in proficiency rates? Or was something else going on?

Here's the distribution of scale scores from 2007 on the Grade 8 ELA test, back when the tests were so allegedly easy that the "lying" was hitting its peak:

This is a normal distribution, otherwise known as a bell curve:

The NY State scores are the average scale scores for schools and not individual student scores, but the idea is the same: a few schools have high scores, a few schools have low scores, and most are in the middle somewhere. Here's 2008:

And 2009:

Pretty much the same thing each year, with the statewide average somewhere north of 650 points: lots of schools in the middle, a few really high, and a few really low. 

Now, the next year was that first big drop in proficiency rates for New York -- you know, when the "lying" finally began to abate. What happened to the scores?

Do you see a big difference between the 2009 "lying" distribution and the 2010 "honest" one? Here, let me put them on top of each other after standardizing them (more on that in a minute):

The distribution of the scores is practically the same. But how could that be? Everyone knows the scores in 2009 were "lying," and 2010's scores were much more "honest." How could they be so similar?

The answer is that changes in a standardized test don't affect how its scores are ultimately distributed: it's always a normal, bell curve distribution. The tests have a variety of items: some easy, some moderately hard, some very hard. It wouldn't make sense to have a test where all the items were of the same difficulty, would it? How would you know who was at the top and who was at the bottom?

After grading, items are converted from raw scores to scale scores; here's a neat little policy brief from ETS on how and why that happens. Between the item construction, the item selection, and the scaling, the tests are all but guaranteed to yield bell-shaped distributions.

The point is this: Messing around with the proficiency rates, for reasons that may even be perfectly justifiable, does not affect the distribution of the scores themselves.

Don't believe me? Here's 2011:

And 2012:

And 2013:

Whoa, wait a minute! Look carefully at the x-axis: scores that were centered at around 650 are now centered around 300. What happened?

Think of it as changing from Fahrenheit to Celsius: "77 degrees" becomes "22 degrees," but the temperature still feels the same. And, in this case, the distribution of scores remains the same. The scores have different numerical values, but the change is really only a matter of how to describe the distribution.  

By "standardizing" the scores, we can see how they compare to each other, even if they have different scales. Here's how 2012's standardized scores compare with 2013's:

Even though the numerical scale changed, the distribution of the scores remained pretty much the same. And that was true in 2014 as well:

In fact, if we overlay the distribution of those "easy" 2009 scores against the same distribution of the "hard" 2014 scores (both standardized), guess what?

The tests might have changed in content and format, but they still yield the same normal, bell curve distribution of scale scores.

Now some of you ought be thinking: "Well, how do we know that a school that got better scores on the old, 'easy' tests was still getting better scores on the 'hard' Common Core tests?" That's simple enough to check:

I know the scatterplots make some of you crazy, but this one is pretty straightforward. Here are the standardized scale scores for each New York school in Grade 8 ELA for 2012 and 2013. The 2012 score is on the horizontal axis; 2013 is on the vertical. Simply put: if a school had a high 2012 score, it had a high 2013 score, with a little statistical noise thrown in. If a school had a low 2012 score, it had a low 2013 score. Schools that scored in the middle stayed in the middle.

In other words: Even though the test changed to a "hard," Common Core-aligned format in 2013, the relative scores for schools in New York stayed the same. 85 percent of the variation in 2013 scores can be explained by 2012 scores. Further:

Schools that scored relatively high in 2009 continued to score high in 2013. In fact:

Schools that scored high in 2007 continued to score high in 2014. Even though New York's tests had gone through two major revisions of setting proficiency rates, the relative positions of schools stayed constant.

So, where, exactly, is the "lying" that worries Duncan and Petrilli so much? After the Common Core tests were introduced, the test scores had the same distribution, and the schools pretty much retained their positions relative to each other. Everything had remained the same!

Except for this:

The changes in the construction of items may have dismayed parents, teachers and students. But when it comes to reporting outcomes, the only things that changed that really matter are the proficiency rates. And those can be set wherever those in power choose to set them.

So if you want to do what New York did, and make a nonsense case that everyone should be able to get a B-minus in a freshman class at a four-year university, and set that as a proficiency rate... well, if you're in charge, no one's going to stop you.

Even if you take the measure of this goal from an instrument -- the SAT -- that is explicitly designed to rank and order students, and use it to set the cut point for proficiency. Not everyone can be above average; someone has to be at the bottom of the distribution.

In addition, we only have enough of those 4-year college seats for, at best, 30 to 40 percent of the population to earn a degree, and we have millions of necessary jobs that do not require that education, and much of college is about credentialing and not acquiring economically productive skills, and even when college students do actually pursue acquiring those skills (like STEM), most of them won't get jobs requiring them anyway.

Let me make an important point here -- something I was reminded of when discussing all this with someone well-acquainted with these issues. There most certainly have been cases where states have set ridiculously low standards for proficiency. The end game for doing so was to keep the revenue going to education as small as possible; that way, states could underfund their schools and still claim their children were getting a great education.

At some point, we have to agree to a common set of standards if we're ever going to ensure system accountability. But raising standards by itself is not enough.

As I showed earlier this year, plenty of states have high standards: that doesn't mean they get good results. New York's standards are very high, but its actual performance lags well behind states like Massachusetts and New Jersey, whose standards are quite average. Why? Many reasons, perhaps -- but here's one that's rarely discussed in reformy circles:

Lord knows that Jersey hasn't been pulling its weight on school funding lately. New York, however, is a particularly awful mess when it comes to funding schools: many of its so-called "failing" districts are getting screwed. This even as New York's upstate cities wither economically, and childhood poverty rates soar.

Why don't I ever hear Arne Duncan talk about this? Why is he apparently convinced that moving around the cut score for proficiency on a standardized test is much more important than making sure that schools have the resources they need to do the jobs they are supposed to do?

Why don't I ever read writings from Michael Petrilli's on this? Why does he think it's so urgent to move proficiency rate cut scores, even as he underplays the effects of childhood poverty on education outcomes?

If you really want high performing schools, you have to make sure they have adequate funding. And if you want high-performing students, you need to make sure they come to school ready to learn. Setting the cut scores for proficiency rates may matter, but not nearly as much as funding and poverty. 

And anyone who tells you otherwise is lying.

Would I lie to you?

* I calculated these from weighted averages by school; might be a little bit different from "official" statewide results.


Unknown said...

If standards are so important, why is there no relationship between the supposed rigor of a state's standards and the stare's performance on NAEP?

lch said...

Great article.

Steve Ruis said...

If all states having the same curriculum standards is so important to the eduformers, why are they not pushing for all states to have the same test standards? Why are these conservatives pushing for national control of a state function like this? Could it be we need to "follow the money"?

What job has Arne Duncan been promised after his tenure at DOE that he was eager to get the federal government sucked into this illegal intervention into educational curricula?

Unknown said...

Brilliant argument and graphics.

gfb9+2/3 said...

Excellent work, as usual. I love the graphs!