There is no arguing with Dr. Rockoff's academic qualifications or intellect; still, it speaks volumes about where we are in our education debate that an economics professor is brought before the NJBOE as an expert in education policy. As Bruce Baker says:
Behavioral economics is an interesting and potentially useful field of academic inquiry. At its best, real behavioral economics attempts to address some of the concerns I raise here. But many if not most assumptions about human behavior and response to incentives are not representative of behavioral economics at its best.
Each time I listen back to Dr. Rockoff's presentation, I find myself thinking's about Dr. Baker's words. And that's why I'm going to take some time over the next several days to really pick apart Rockoff's arguments from a practical perspective.Specifically, I’m increasingly concerned with what I see as the simple-minded projection of economic thinking onto everyone and anyone else, leading to ridiculous policy recommendations – that amazingly – get taken seriously – at least by the media and punditocracy. [emphasis mine]
It's all well and good to run statistical models and extrapolate theories based on tangentially related research. It's quite another to take these musings and apply them to everyday, real-life schools. I sometimes wonder if folks like Rockoff - brilliant as they are - are flying at 40,000 feet over those of us down in the trenches, dropping bombs and worrying little about the collateral damage.
But judge for yourself:
Before we dive into the details, let me make three points about a metaphor Rockoff uses several times in his speech:
Rockoff apparently likes baseball analogies. He spends a lot of time in the presentation talking about batting averages. Of course, he acknowledges, batting average is only one component of determining whether a player is valuable. The problem is that he then analogizes this to the "multiple measures" practice of teacher evaluation: combining test-based measures with observations and other qualitative measures.
Here's the first problem with Rockoff's analogy: in baseball, many other quantitative measures are used (in addition to qualitative measures) besides BA to determine a player's "value." Check out this post about players who hit under .300 and yet were still great offensive forces for their teams. You couldn't quantitatively determine their worth without considering so many other stats: Plate Appearances, On Base Percentage, Runs, Hits, etc.
My wife's family is a bunch of baseball nerds, and they'll pour over this stuff for hours. It would be nice to think that we could do this for teachers, but, of course, we can't: it would be wildly expensive to create and collect so much data, it would take too much time from instruction, and we would be measuring the teacher on the performance of her students when only 10 to 15 percent of student outcome is based on teacher inputs.
Further, the "plate appearances" of a teacher in a test-based evaluation model come down to the number of students they teach: each student is a "batter" who gets one turn at the plate on the yearly NJASK. Would anyone try to guess whether Derek Jeter is contributing to the Yankees' offense on the basis of 25 or 30 plate appearances a season?
Baseball has an easy way to incorporate a lot of statistical information into each player's evaluation; teaching does not. On that basis alone, Rockoff's metaphor is clearly wanting.
Second problem: Rockoff talks about a .300 batting average as the mark of a great hitter. This is a cut score: an arbitrary level that a player must cross to be categorized as "great." Would anyone really argue that a .299 hitter is practically much worse than a .300 hitter? Would we say that Mickey Mantle or Barry Bonds (putting the you-know-what problems aside) weren't great hitters?
The NJDOE is proposing to take mSGP scores for teachers and convert them into ranked categories - what statisticians call ordinal measures*. This necessitates the use of cut scores; it's equivalent to saying: "You're only getting into the Hall of Fame if you hit .300." In this situation, little differences count for a lot - especially when the decision is being forced on to an administrator (more on this later).
This is a fundamental problem when you combine a highly variable measure like mSGP with a four-rank measure like observation scores: some of the evaluation, all of the decision. It would be like determining a player's worth by combining all of the other aspects of his game - power hitting, running, fielding, clubhouse leadership - into one of four categories, and then combining that with his batting average. The BA becomes the deciding factor, just as the mSGP becomes the deciding factor for the teacher.
Third problem: this post about the magical .300 BA is fascinating:
So, why would we have this big jump between .299 and .300? To me, it's as good of a demonstration of Campbell's Law as you could imagine:In a recent New York Times article Alan Schwarz talks about research by several University of Pennsylvania professors about the psychological importance of hitting .300.This phemomena can be demonstrated by looking at the season batting averages of all baseball hitters in the past 50 seasons. I collected the batting averages for all regular players since 1960 who had at least 300 opportunities to bat. For each player, I rounded the batting average to three decimal spaces as commonly done in professional baseball. The figure displays a line graph of the frequencies of the batting averages between .290 and .310.This graph shows the importance of the .300 mark in baseball. Batting averages slightly smaller than .300 are unlikely but a batting average of .300 is very likely. Athletes might like to say that the performance statistics are not important, but there are patterns in this data that suggest otherwise.
It's quite conceivable that when a hitter is hovering around the .300 level, he starts changing his behavior. Maybe he asks the manager to let him get more chances at the plate when he's a little below; maybe he asks for fewer chances when he's just above. The arbitrary value assigned to getting above .300 creates a corrupting pressure (in the case of Barry Bonds and Michelle Rhee, perhaps an extreme corrupting pressure).
Why wouldn't we believe the same thing about using test scores in teacher evaluations? Why wouldn't we think the use of what will inevitably be an arbitrary cut score will create behaviors designed to game the system? And is that really in the best interests of children?
More to come...
NJDOE: They may be lost but they're making great time!
ADDING: Rockoff says that if you are in the bottom quartile of hitters in the majors, you don't get to bat. But there are terrible batters in the National League who bat regularly throughout the season: they're called pitchers.
* Actually, the SGP is an ordinal measure as well. More to come.