What makes the Triple Lindy so difficult is that Rodney starts out on the high platform, then has to bounce from board to board over and over again before he finally slices through the water. It's one massive leap after another...
Just like the scoring on the New York State exams.
By now, you've undoubtedly heard about the huge dive test scores took this year in New York, thanks to new exams and new scoring methods that supporters claim are much more "realistic." This year, 31 percent of students were deemed "proficient" in reading and math; last year, 55 percent passed reading, and 65 percent passed math. I've argued that this was a deliberate ploy to make New York's schools look as bad as possible; reformies hope to usher in an era of privatization and deunionization on the "evidence" of these scores.
But how did NYSED come up with their definition of "proficient"? How did they determine the cut scores - the scores students would have to earn to be deemed to have achieved one of the four levels of "proficiency"?
Leonie Haimson pointed me to a fascinating document from NYSED that shows the method to this madness. Just like Rodney in the diving meet, NYSED bounced around from one external benchmark to another, taking massive leaps each time to justify where it placed its cut scores. By executing its own Triple Lindy, NYSED moved from its goal of defining "proficiency" as "college and career ready" to cut scores on the tests for levels as low as 3rd grade.
How about we play Greg Louganis for a bit and break down exactly how NYSED pulls off their Triple Lindy:
Starting Platform: "College and Career Ready." As I blogged before, this is a phony and useless phrase meant to conflate the education necessary to obtain a four-year bachelor's degree with the education needed for a job that should pay well but often doesn't. Reformy NYSED Commissioner John King, reformy SecEd Arne Duncan, reformy Regents Chancellor Merryl Tisch, and many other reformy stalwarts say this is the goal for all children, which means, right off the platform, we have to make a massive leap:
Springboard #1: First Year Grade Point Average at a Four-Year College. As the NYSED document clearly shows, the benchmark used for "proficiency" on the state test is earning a B- or better in a freshman English course, or earning at least a C+ in a math course, at a four-year college or university.
For this benchmarking, NYSED relied on the College Board, the folks in charge of the SAT. Demonstrating that the SAT predicts college GPA is one of the most important research tasks of the College Board: if the SAT didn't show some correlation between its score and college grades, there wouldn't be much of a point to using it for college admissions.
So the College Board regularly publishes "validity research," designed to show how well SAT scores match up with first year college grades (keep in mind: the College Board is hardly a disinterested party in this research). When you read these studies, you'll see that the researchers draw from a sample of around one to two hundred colleges and universities, all granting four-year degrees; community colleges need not apply. The researchers match different courses to different sections of the SAT: the reading section, for example, is matched to history and English courses, while the math section is matched with (surprise!) math courses.
There is a correlation: somewhere around 0.5, which is geek-speak for a moderate relationship, but hardly air-tight (the practical meaning of this correlation is naturally a subject of great debate). This is, of course, to be expected. After all, would anyone say that an "A" in Stochastic Calculus at M.I.T. was equivalent to an "A" in Intro To Statistics For Social Sciences at SUNY Binghamton? For that matter, how consistent are the grades given at the same institution for the same class taught by two different professors?
The notion that there is a uniform standard for a "B-" in freshman classes, consistent across course content, professors, and institutions, flies in the face of reason. But this is exactly what NYSED did: getting a "B-" in any course they choose with any professor at any college or university on their list is now equivalent to "college and career ready." Which gets us ready for our next big jump:
Springboard #2: SAT Scores. Again, there is a correlation between SAT scores and first-year college GPA. And I'm not saying there isn't a value for college admissions offices in SAT scores (again, let's put aside the controversy about using the SAT in college admissions). I am saying no admissions office I've ever heard of sets arbitrary cut scores for the SAT, because the test is hardly perfect when predicting college GPA. And, again, first-year GPA is not the only measure of college "success."
Unfortunately, NYSED ignored all these caveats when benchmarking the state tests: "Proficient," they say, "is equivalent to a 560 on the SAT Critical Reading section, a 530 in Writing, and a 540 in Math." We won't even get into the many ways the College Board and NYSED played around to get those numbers; for right now, it's enough to say that we've leapt from "college and career ready" to a largely arbitrary cut score on the SAT (and PSAT).
And, in fact, both the College Board researchers and NYSED acknowledge in the document that these cut scores on the SAT are prone to error. The SAT cut scores are matched to probabilities that a student will earn a specific grade in a college course. You'll notice there are no such nods to testing error on the NY state tests.
So now the SAT/PSAT cut scores are set; time to ricochet over to another measure:
Springboard #3: 8th Grade New York State Test Scores. Diane Ravitch points us to a terrific post from Dr. Maria Baldassarre-Hopkins, Assistant Professor of Language, Literacy and Technology at Nazareth College. Baldassarre-Hopkins served on the committee that made the recommendations to Commissioner King as to where to set the cut scores - after the tests had been administered and graded. In the comments below the post, Baldassarre-Hopkins confirms the SAT and PSAT were used as external benchmarks.
Baldassarre-Hopkins describes the process for setting cut scores, known as "bookmarking":
(Another guilty pleasure: The Matrix trilogy. Can't get enough.)
Now, here's the critical point to understand in all this: it's not that these people sat in the Hilton and used their expertise to say, for example: "A 'proficient' 7th grader should be able to do this, this, and that." No, what Baldassarre-Hopkins makes clear is that the distribution of passing scores for each item on the tests would determine whether passing that item demonstrated proficiency. In other words: a test question was considered "hard" or "easy" not because it required a particular skill; its difficulty was determined based on how many students got it correct. In fact, Baldassarre-Hopkins tells us NYSED gave her group data that showed how many students passed each test question:
So here's what happened: this group was given the percentage of 8th Graders who passed each item on the test. They were told that students who meet the "B- in English/C+ in math" college grades standard - a standard set by SAT scores - should be put at a level 3 for the state test. Keep in mind that these are 8th graders, but the SAT is typically taken in 11th grade. How did they account for the discrepancy? No one's saying.
We'll talk later about the issues with using the SAT, a normative test, as a benchmark for what should be a criteria-based test. But it's now time to take another leap: this time, to the other grade levels:
Springboard #4: 7th through 3rd Grade New York State Test Scores. Baldassarre-Hopkins makes clear that the cut scores for the other grades were set largely by comparing them to the 8th grade standard:
We began the bookmarking process with grade 8, later repeating the entire process for grades 7 and 6. I will try to be as brief as possible:
I'll bet that was the most interesting day of the entire week: getting Grade 6 and Grade 5 to match up must have been a bear. In any case, I think Baldassarre-Hopkins gives us an important clue when she says her group worked from the top down. Grade 8 was informed by the external benchmarks - the SAT and PSAT. Grade 7 was then informed by grade 8. Grade 6 was informed by grade 7, and the elementary grades then were made to match up with the middle school grades. But it all started with the Grade 8 alignment with the SAT/PSAT.
Now, there's one other leap we have to make here - probably the biggest one of all:
Springboard #5: Teacher/Principal Evaluations, School Ratings, and Student Instruction Decisions. It would be one thing if these test scores didn't have high stakes consequences. The results would be published; Mike Bloomberg would spin them however he is wont to do; parents, teachers, principals and students would use the data to inform instruction and classroom practice as they saw fit; and that would be that.
But we live in John King's reformy New York, where test scores are used to evaluate teachers, dictate which schools should be closed, and determine whether children should be retained or given differentiated instruction. These test results have consequences - often serious consequences. So, even though the final cut scores are the result of a convoluted process, people's lives are profoundly affected by them.
Let's review the steps of the New York Test Score Triple Lindy one more time:
- Start with "college and career ready," an ill-defined phrase that could mean just about anything.
- Leap to freshman year GPA in selected courses at a limited number of four-year colleges. Could be graded on or off a curve (normative or criteria-based - more on this later); varies widely between professors, schools, and courses; doesn't necessarily indicate whether the student's entire college experience was "successful."
- Spring to SAT/PSAT scores, somewhat correlated to first year college GPA, but a normative assessment (meaning a set number of students must score at each percentile - someone's got to lose). This is a test, by the way, tightly correlated to family income.
- Bounce to 8th grade NY State test scores, which are given three years before the SAT.
- Carom (got a thesaurus?) to 3rd through 7th NY State test scores, which would assume all children follow the same learning trajectory.
- Jounce (SAT word!) to teacher/principal evaluations and school evaluations and student retention decisions.
May I give my informed opinion here? Pardon my technical language, but:
This is friggin' nuts.
No wonder John King, Merryl Tisch, Andrew Cuomo, and Arne Duncan can't get no respect. More to come...
I may get no respect, but I'm still "college and career" ready!
ADDING: In the middle of writing this, I realized where I had first heard this metaphor. Yeah, big surprise: it's you-know-who...