Jersey Jazzman: Think-Tanky Thinking

Tuesday, May 14, 2013

Think-Tanky Thinking

Dr. Matthew Ladner is undoubtedly a very smart guy. I'm sure that's why Jeb! Bush's Foundation for Educational Excellence pays him to write blog posts that bolster Jeb!'s preferred policies:

I had the opportunity to discuss A-F school grading with a thoughtful skeptic yesterday. Sadly my doubting Thomas remained a skeptic at the end of our discussion. I showed him data about the trend for improving grades in Florida, and he produced data to show improving fuzzy labels from his state. I told him that Florida’s progress is confirmed by improving NAEP data, whereas his state has flatlined on NAEP over the last decade despite improved state scores. He wasn’t buying it.

My failure to persuade however got me to thinking about the Trial Urban District Assessment NAEP data. I ran the proficiency numbers for free and reduced lunch eligible students in all the districts and found the following for 4th grade reading:

Note that the top 3 performers all operate under an A-F school grading system Hillsborough (Tampa), Miami-Dade and New York City (NYC has operated under A-F longer than any non-Florida district). Obviously there are plenty of other factors at play than school grading, but note that a poor child in Tampa is almost six times more likely to be reading at a proficient level than a poor child in Detroit.

Everyone got the premise? See, Florida (thanks to Jeb!) and New York City (thanks to mediocre-at-best former chancellor Joel Klein) grade their schools - we're not talking about the kids, but the schools - on an A to F scale. And while Ladner generously grants that maybe a few other things matter in student achievement than school-level accountability systems, he clearly believes we have strong evidence here that the A-F school grading system improves student learning.

Let's set aside any doubts we may have that the National Assessment for Educational Progress gives us evidence, by itself, of the efficacy of any particular educational policy. Let's not worry about the big demographic differences between these school districts. We'll even throw away the fact that Ladner chided his "thoughtful skeptic" for not paying attention to growth in scores, while he himself uses evidence that is merely cross-sectional.

Let's, instead, play around with the data a bit: you know, just for kicks. We'll start by going to the source, the NAEP Data Explorer. Can we replicate Ladner's chart, showing the reading proficiency rates for students eligible for Free and Reduced Price Lunch (FRPL) on the NAEP?

OK, it appears we can; we know we're using the same data set. To make things easier, I've highlighted Ladner's three "top performers" - the districts that use A-F school report cards - in red.

Now, Ladner only refers to reading scores in his post (for 4th and 8th grade, the two main grade levels reported by the NAEP). But the NAEP has two major tests: reading and math. How did the three A-Fers do?

Not quite as clear-cut now, is it? New York City is at the top, but Tampa took a hit, and Miami suffered a big drop. But hold on...

We're looking at proficiency rates for the NAEP; there are two problems with this. The first is that the definition of "proficiency" for this test is quite high; it's a mistake to equate it with the layman's definition of "proficiency," or the way the term is used on state-level tests. The second is that proficiency rates don't tell the entire story about a school district's performance. A proficiency rate is just a cut score: two districts with the same proficiency rate could have very different average raw scores.

Instead of proficiency rates, let's look at the average raw scores:

NYC moves down, and Ladner's hypothesis looks increasingly less likely. Of course, in Florida the A-F system is a statewide policy. How does the entire state fare against other states?

(I marked New York in red for consistency, but keep in mind that the A-F school evaluation system is not a statewide policy there; A-F is only used in New York City.)

One more issue: we're conflating Free Lunch eligibility with Reduced Price Lunch eligibility. That's a no-no; Free Lunch is a deeper level of poverty, and that matters when measuring student achievement on tests. The NAEP has some issues with disaggregating FRPL data (use caution when approaching the scores of the states with asterisks), but here are those scores when looking only at Free Lunch, and not Reduced Price Lunch, eligibility:

And so now we come to the heart of the matter: the burden of proof. Because it's one thing to play around with numbers and pose questions and even publish your musings; I do it all the time. That's good clean fun and a great way to put off mowing the lawn.

But that's not what Ladner is doing here. Instead, he's attempting to provide ammunition to those - like Jeb!'s FEE and Michelle Rhee's StudentsFirst - who insist that A-F school report cards are a necessary policy change. And since he's taking the affirmative position, the burden of proof falls to him. He can't just pick and choose data that suits his fancy; he has to have a response to legitimate counterarguments, especially when they are based on the same data set he uses.

In this case, I don't think he'll have a response.

This, my friends, is a prime example of think-tanky thinking: define the policy you want first, pick the data you need to bolster your case, and simply ignore the counterarguments.

Again, I'm sure Matthew Ladner is a very smart guy. But when you live by the data, you gotta die by the data.

But I wanna pick my own data!

ADDING: Ladner cites a report by the Urban Institute to further back up his claims about the efficacy of A-F school reports. A critical review of this report, however, calls into question some of its claims. I recommend this review, written by - I can't believe this - Damien Betebenner.

Yeah, that Damian Betebenner. Ironical, ain't it?

2 comments:

Tom Hoffman said...: Am I reading this wrong or does the Urban Institute report at best, in math, give the grading system credit for 38% of a 6 to 14% of a SD gain? So... a 2% to 5% SD gain? They're hanging their hat on that?; May 14, 2013 at 8:27:00 AM PDT
Duke said...: That's how I read it, Tom. Next to nothing in practical terms.; May 16, 2013 at 6:32:00 PM PDT