Joel Klein is one of America's foremost proponents of corporate education "reform." He asserts that public education is in such a crisis that it constitutes a national security threat - a threat that can be dealt with by buying technology from the firm he runs for Rupert Murdoch, Amplify. He also believes that the "reforms" he and Mayor Michael Bloomberg instituted in New York City - including mayoral control, school closings, charter expansion, and test-based teacher evaluations - should be replicated across the country, based on what he terms the "compelling" improvements made under his tenure.
But does the record support his claims? What is the legacy of Joel Klein in New York City? I'm looking at the evidence to see if the facts support Klein's boasts. Here's the series so far:
Part I: Joel Klein has no problem twisting the facts to suit his ends. Has he done the same thing when crowing about his "success" in New York?
Part II: When you break down national test scores by student subgroups, Klein's "success" in New York isn't very impressive; in fact, it's downright disappointing.
Part III: A look at national test scores as a whole, comparing New York to other cities during Klein's reign. The results? Meh.
Part V: The demographics of the "Big Five" cities in New York State changed dramatically during Klein's tenure. Yet Klein doesn't mention this when discussing his record of "success."
Once again, here's former NYC Schools Chancellor Joel Klein on his - and Mayor Michael Bloomberg's - record of "success':
Similarly strong improvements were achieved on state tests, although the story here is harder to explain because the state changed the testing requirements several years ago, and so the number of kids passing (i.e., achieving the proficiency or advanced level) dropped because the questions were intentionally made harder and it took more correct answers to pass.
Nevertheless, if you compare New York City’s performance under Bloomberg to two other groups (the other so-called “big four” cities in New York State — Rochester, Buffalo, Syracuse and Yonkers — and the rest of the state), each of which took precisely the same tests as the city, the results are compelling.
In a nutshell, in 2002, when Bloomberg started, on all four tests, New York was much closer to the big four cities and far away from the rest of the state.
Today, it’s the other way around, showing that the city moved significantly forward when compared to the other two relevant groups. Indeed, despite the fact that the exams were more difficult and required more correct answers to pass, New York City increased its pass rate on all four tests; the other two groups didn’t come close. [emphasis mine]As I showed in my last two posts, Klein is engaging in a serious deception of omission here, because he refuses to acknowledge the very large changes in both child poverty rates and student demographics in the "Big Five" over the last decade. Any fair reading of these statistics would concede that New York City has enjoyed an advantage over Rochester, Syracuse, and Buffalo (less so with Yonkers) that would help Gotham's test scores relative to the other cities.
But what if a researcher tried to control for these shifts in student populations? What if he could isolate the effect of New York's City's "reforms" from both state-wide trends in tests scores and changes in student populations? To his credit, Klein briefly mentions one such study:
Independent studies by respected researchers like Caroline Hoxby and Margaret Raymond at Stanford, James Kemple at the Research Alliance and the independent public policy institute MDRC all support this conclusion. So do the numbers, which the critics conveniently ignore. [emphasis mine]Here is the copy of Kemple's study I've been working off of; apparently, it's also been published in a book, which I have not yet read, in what Kemple's paper says is a "slightly edited form."
Let's start with an acknowledgement: this is a clever study and Kemple is a serious researcher. But does the study back up Klein claims? When we account for student differences, can we attribute relative gains in NYC's schools to Bloomberg's and Klein's policies?
Let's ask Kemple himself:
A second limitation is that although the analysis can shed light on the overall effect of reforms that occurred during the Children First era, it cannot isolate the specific features of the Children First reforms that may have been most responsible for these effects. As discussed elsewhere in this volume, the reforms have had many components, and these were designed to interact with each other and with other policies and school conditions. By exploring the variation in Children First effects across schools within New York City and by collecting data on the implementation of specific reform activities, further research can expound on the mechanisms that enhanced or limited the impact of Children First reforms; studies of this nature are being undertaken by a number of researchers, including the Research Alliance for New York City Schools.So Kemple is cautioning against using his study in exactly the way Joel Klein uses it. Even if Kemple can show a gain in NYC's relative scores, we should be very cautious about saying those gains were caused by policies under Bloomberg and Klein.
As a final note before discussing the findings, the paper also presents differences and similarities between New York City and the remaining New York State districts in an effort to provide a further context for interpreting test score trends in New York City. These comparisons control statistically for test score trends prior to 2003 and for differences in school demographic characteristics. However, there are dramatic demographic and prior-performance differences between New York City and the rest of the state; greater caution should be exercised when using these comparisons to infer evidence about effects of reforms undertaken during the Children First era. (p. 8) [emphasis mine]
But even if we could say with certainty that relatively higher test score gains should be attributed to Klein's policies, I'd still say Klein is wrong to cite Kemple's study as proof of his "success" - because I don't think Kemple has conclusively proved that NYC had better student achievement than the other "Big Five" cities when accounting for student population changes.
Kemple uses a method called a "comparative interrupted time series analysis" to create what researchers call a "counterfactual": a hypothetical outcome of what would have occurred had Klein's policies not been in place. Here's an example:
This shows the "Percent of Students Scoring at Level 3 or 4, Grade 4, Mathematics School Year 1998-1999 to School Year 2009-2010." The color key is in the chart below.
Here we have the hypothetical "counterfactual" judged against the actual gains of NYC's students. Which begs the question: how did Kemple come up with his "counterfactual"? On this question, I'm afraid, the documentation is light: I would challenge anyone to replicate Kemple's results based on what's in this paper. Kemple does refer to other research that has used the same methods, which is fine, but it doesn't help us understand how he came up with his counterfactual. So we're left in the dark here on the critical question: is the counterfactual a valid comparison?
Even thought we don't have Kemple's exact methods, from what I can ascertain I'd say there are more than enough issues here that we should be very, very wary of the conclusions Joel Klein makes based on this study:
- Conflating Free Lunch and Reduced-Price Lunch. This is one of Bruce Baker's pet peeves, and rightly so: there is a real difference between the deep poverty of children who qualify for Free Lunch and those who are in relatively less poverty who qualify for Reduced-Price Lunch. It appears that Kemple conflates the two categories, and that may well mask some of the variation between New York City and the other Big Five. Remember, NYC did not get hit with the sharp increases in child poverty that ravaged Upstate over the last decade.
(By the way: in Kemple's paper, it appears that he does not use any poverty data when constructing his "Estimated Counterfactual," but does use it when creating the "Adjusted New York State" estimate (p.10). Why? And why use it later (p.19)? Is this a publication error?)
- Using Proficiency Rates changes instead of test scores changes. This is one Matt DiCarlo harps on often: changes in proficiency rates are not the same as changes in test scores. Kemple himself acknowledges that simply dealing with proficiency rates may mask real improvements - or declines - in test scores.
A proficiency rate tells us how many students passed a certain minimal score on a test. Think of a track team: if five out of ten high jumpers on the team can clear six feet, we might say their proficiency rate is 50%. If we want to increase this rate, we would need to get more members of the team to jump over that six-foot bar (or lower the bar - which is what New York State did. Kemple says that's OK because all students in the state were affected - but were they affected equally? Hmm...).
But that doesn't tell us how high every athlete can jump, and it doesn't tell us about their improvement. If one team member starts the season at 4' 9" and ends at 5' 11", he will have made great progress, but he still won't be "proficient." And if another jumper starts the season at 6' 8" but ends at 6' 1", we won't see his decline measured in the team's "proficiency."
Kemple use of proficiency rates is likely masking many changes in actual test scores. His paper says he will be looking at changes in actual scores the future; I don't know if this analysis has been released.
- Using data that isn't finely disaggregated. This is a problem in all educational research: the data sets just aren't very well-tuned. "Special education," for example, is a great big box that includes students who have everything from simple speech impediments to severe cognitive disabilities. There's not much a researcher can do about that, and it doesn't mean you should discount research on this basis alone. However...
Lets go back to the changes in student demographics in New York State over the past decade. Take white students:
I had to double check the numbers when I saw this, but it's true: the Syracuse City School District had 9,799 white students in 2002-03, and 5,843 white students in 2010-11. Buffalo and Rochester saw significant declines as well. The Upstate urban cores have seen significant drops in white population over the last decade (p. 15), but I suspect the aging of the Upstate white population is also contributing to the student population changes. NYC's relative white student population has remained flat.But in the comments, Ken Houghton makes an excellent point:
One of the problems with using percentages is that it can obscure population changes. 43% of x(2003) = 9,799, while 27% of x(2011) = 5,843. So there's about a 5% drop in the population in Syracuse over that time, but a 40% decline in the White population.Kemple's model may well account for a declining white population in Syracuse when he builds his counterfactual for New York City; what he can't account for, however, is the differences within that sub-population of white students: most likely, the families who leave are quite different from the families who stay.
Which wouldn't matter if White were homogeneous. But it's not--and the people most likely to leave are those who have the resources to be able to leave.
It's a double effect: you have fewer White students, and significantly more poor White students as a percent of the White population.
If anything, your down-and-dirty method understates the negative impact, because the cohort has changed as well.
So we've got some real concerns here. I'd also add that the study - like any study that uses standardized tests results - naturally conflates student achievement with test scores. Even if we buy into the validity of Kemple's counterfactual, we have to take a further leap and assume that gains on New York state tests are actually gains in real learning. But it may well be that New York City's children didn't get relatively better at math or language arts; instead, they got better at taking state math and language arts tests.
Given that New York City's performance on the National Assessment of Educational Progress was relatively weak over the same time period, I'm inclined to believe that any relative gains on New York's tests were a manifestation of Campbell's Law: Klein made the state tests the focus of schooling, so the test scores rose.
Again, this is a serious study and it deserves a serious reading. But absent a theory as to why Klein's and Bloomberg's policies might have led to a relative rise in New York City's proficiency rates, Kemple's research has more than enough holes in it for us to approach it with great caution. It certainly isn't a strong enough counterweight to the relatively poor showing on the NAEP that we should ignore NYC's disappointing performance on national tests.
In addition: we now have new reasons to believe that even the absolute gains NYC made in state tests are suspicious. Like Washington, D.C. and Atlanta, evidence is beginning to emerge of a culture of cheating within New York City's schools. More on this next.
Uh-oh...
Thanks for all this.
ReplyDeleteMy favorite point you made is "it may well be that New York City's children didn't get relatively better at math or language arts; instead, they got better at taking state math and language arts tests."
Wish we had the resources to Amplify that
Thanks; given the high stakes environment in NYC and the fact that the state tests got more predictable over time, as even NYSED finally admitted, it's very easy to see how NYC students and teachers by excessive test prep were able to "game" the system. The fact is that NYC gains on the NYS exams were minimal when compared to the NAEPS is another indication of this. The high stakes environment of NYC schools, combined with the demographic trends you outline here and in your other posts, could easily explain why the "better than Buffalo" claim of Bloomberg, Klein et al just don't cut it. The fact that Kemple didn't appear to consider either factor in his analysis makes his conclusions highly suspect.
ReplyDelete