Jersey Jazzman: Data Wars, Episode I

Bruce Baker wrote a post yesterday about the appropriate use of data that really should be read by all engaged in education policy debates:

My next few blog posts will return to a common theme on this blog – appropriate use of publicly available data sources. I figure it’s time to put some positive, instructive stuff out there. Some guidance for more casual users (and more reckless ones) of public data sources and for those must making their way into the game. In this post, I provide a few tips on using publicly available New Jersey schools data. The guidance provided herein is largely in response to repeated errors I’ve seen over time in using and reporting New Jersey school data, where some of those errors are simple oversight and lack of deep understanding of the data, and others of those errors seem a bit more suspect. Most of these recommendations apply to using other states’ data as well. Notably, most of these are tips that a thoughtful data analyst would arrive at on his/her own, by engaging in the appropriate preliminary evaluations of the data. But sadly these days, it doesn’t seem to work that way.

I'll be the first to confess that I am still relatively new to "the game," and I'm as prone to mistakes as anyone. The data and analysis I've been putting out on this blog and elsewhere is, undoubtedly, open to scrutiny and critique. There may well be things I haven't thought of when making a criticism, and I'm more than willing to listen to a contrary point of view and debate it.

But here's the thing...

I'm not in charge of anything. I'm not the guy making policies: I'm the guy who, along with my fellow teachers across this state and the nation, has to live with them. So when I see a snake oil salesman like Joel Klein put out a blatantly deceptive graph, that just bugs the hell out of me. Klein ran the largest schools system in the US; arguably, it is also the most-studied, as many education researchers are headquartered in NYC. Yet here he is, using data in an utterly fraudulent way.

Or take his latest hire, Former Acting NJ Education Commissioner Chris Cerf, whose magic graph "proved" poverty just doesn't matter. Click through to see how totally disingenuous this "data-based" thinking really is. Yet, in spite of his persistent abuse of data, Cerf has had more influence on New Jersey's schools than any other commissioner in a generation.

And then there's Michelle Rhee, who can't read even the most basic research on education. Rhee gets the amount of time spent preparing for standardized tests wrong in this piece, which was originally published in the Washington Post. I wrote about it last Monday; and yet, incredibly, the Star-Ledger reprinted her inaccuracies just today.

(A message for Tom Moran, editor of the op-ed page at the S-L: you've admitted you read me, buddy. Do you do these things just to give me more fodder for my blog?)

I could go on: Arne Duncan, Rahm Emanuel, Bill Gates... all have enormous power over school policy. All claim to drive their decisions with research. But all turn out to be either woefully misinformed or outright mendacious when using data to justify their policy preferences.

Again: it would be one thing if these people were just voicing their opinions. But what they say and what they do matters. There are consequences to their actions. Their misuse of data has serious ramifications for students, teachers, and families.

Which brings me to our latest episode of data abuse:

The Newark Public Schools have been under state control for 19 years, ostensibly because this district was so poorly run when under local control. Ironically, however, NPS was years late in producing a Long-Range Facilitates Plan (LRFP), as was required by state law. But that changed last month, when the district finally "amended" its 2005 LRFP.

I'm not qualified to say whether this amended plan meets the requirements of the law. I can tell you, after having seen it, that it comes across as a mish-mosh of disparate reports and policy briefs and graphs and powerpoints, slapped together without any overarching organization. I'd tell you to go look and judge for yourself, but -- so far as I can tell -- the plan hasn't been released to the public.

In any case, as I was skimming through, this page stood out (click to enlarge):

So what's going on here? Well, NPS is trying to use test outcomes to identify its "struggling" schools in this amended LRFP (which is weird in and of itself, as the old LRFP concentrated on facilities and didn't have anything to do with test scores). There are two measures in use:

"LAL % Prof +": Language arts proficiency, as defined by whether a student is deemed "proficient" or "advanced proficient" on the NJASK. I can't be sure, but it looks like the proficiency rates were averaged from Grade 3 to 8, which, according to Bruce's post, is a no-no. But we'll set that aside for right now.
"LAL SGP": The infamous Student Growth Percentiles, which have a host of problems of their own -- again, we'll put those aside. Here we have the median SGP in language arts for the entire school population, the state's measure for how a school's students "grow."

To identify which schools are "struggling," NPS has averaged the LAL proficiency rate with the SGP score. Supposedly, they thought they could do this because SGPs and proficiency rates both use a 0-to-100 (or close enough: 1-99) scale. So they've got to be equivalent measures, right?

Wrong:

This is a little tricky, but it makes sense if you break it down a bit. What we have here are the distributions of SGPs and LAL proficiency rates by schools for the Newark area, both public schools and charters. The green bars show the number of schools that got SGP scores within a certain "bin": in other words, there are 16 schools that got SGPs between 35 and 40, the largest bin (and, therefore, the largest bar) in the graph. Notice how the distribution is such that the most schools are in a bin that is roughly at the mean (or average) score, which is about 41. The number of schools in neighboring bins roughly falls off in each direction: this is, very crudely, a normal distribution, aka a "bell curve."

The clear bars show the distribution for proficiency rates in Grade 8 LAL; the mean rate is about 61 percent. Notice that the bars don't follow that bell curve shape: the distribution isn't normal.

Why does this matter? Well, NPS's simple formula -- averaging the two measures -- makes an assumption: that a difference in these two measures is equivalent. In other words, if your schools is ten points higher in SGPs than another school, but that school is ten points higher than you in proficiency, then your two schools are equal on the "struggling" index. Compared to other schools, your school and the other you are comparing yourself to are "struggling" (or not "struggling") the same amount.

Except this graph shows the comparisons are not equivalent. Being ten points above the mean on SGP is a much bigger deal than being ten points above the mean on proficiency. You beat out many more schools when you shift those ten points on SGPs than you do when you shift the same amount on proficiency.

In addition: the means for both measures aren't equivalent. So if you have a proficiency rate of 70, you're barely above the middle of the pack. But if you have an SGP of 70, you're at the top in growth. Of course, there's no way to know that being at the bottom in SGPs is just as good of an indicator that your school is "struggling" as being at the bottom in proficiency. NPS is averaging two measures that have different means, different distributions, and different educational meanings.

And yet high-stakes decisions are being made -- whether schools should be closed, whether the buildings should be turned over to charter management organizations, whether staffs should be fired across the board -- on this misuse of data. Again, it wouldn't matter so much if this was just someone's opinion, but it's not: the people who did this are in charge of making policy for Newark's schools.

There is, of course, nothing wrong with identifying which schools are "struggling" within a system and require intervention. But misusing data to come up with a simplified quantitative measure is a cop-out. This stuff may be complex, but Anderson and her staff signed up for the job on the premise that they were qualified to handle it. Stuff like this suggests they aren't up to the task.

More data wars to come...

Averaging non-equivalent measures: to the Dark Side it leads, young one...

Jersey Jazzman

Sunday, April 13, 2014

Data Wars, Episode I

No comments: