I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Sunday, April 7, 2013

"X Months of Learning" Is a Phony Metric

OK, a little hyperbolic on the headline, but now that I've got your attention...

I've been reading more and more studies these days that attempt to illustrate gains in "student achievement" in ways that lay people can understand. One of the most common methods I've seen is to translate test score gains into this phrase: "x months of learning.Matt DiCarlo points us to a classic example: Eric Hanushek's assertion that the difference between a "good" teacher and a "bad" teacher is a "year of learning":
So, here’s the deal (and this is strictly my opinion): There is a research consensus that estimated test-based teacher effects vary widely between the top and bottom of the distribution, but the “year and a half” assertion should probably be put out to pasture, at least when it’s used without elaboration or qualification.
It implies a precision that belies the diversity of findings within the research literature, and it ignores the importance of context, data availability, variation between test subjects, etc. There are plenty of ways to express the fact that teachers matter without boiling a large, nuanced body of evidence down to a single effect estimate.
Accessible generalizations certainly have their role in policy discussions, but oversimplification has really crippled the debate about value-added and other growth models, on both “sides” of the issue. [emphasis mine]
Unfortunately, one of the legacies of Hanushek's work seems to be exactly this sort of "oversimplification." The (in)famous CREDO charter school studies, for example, contain tables that translate "Growth (in standard deviations)" into "Gain (in months of learning)." Yes, there are cautions about not reading too much into the tables... which the credulous press and ideological politicians then proceed to ignore when reporting on the results.

Here's another example: the (in)famous Mathematica study on the "effects" of Teach For America on math instruction:
This impact is equivalent to an effect size of approximately 0.15 of a standard deviation and translates into roughly 10 percent of a grade equivalent, or about one additional month of math instruction. (p. xiv) [emphasis mine]
As Matt says, this statement implies a precision that just isn't warranted by the research itself. I'd further argue that as a practical illustration, this statement also fails pretty badly. What does "one month" of instruction look like? Is it higher proficiency at a particular skill, or simply new material? Is a child who scores a 75% on a test on Chapter 7 in a math text "one month ahead" of a child who scores a 95% on Chapter 6?

The TFA study showed no difference in reading scores between students who had TFA teachers and those who did not. I'm a music teacher, so this isn't in my wheelhouse, but I'm actually not very surprised by this.* Math instruction has discreet units: find the area of a circle, add two-column numbers, factor a polynomial, etc. Could it be that standardized math scores are measuring how many units of instruction a student has been exposed to, rather than whether they gained high proficiency in fewer units? If so, how does that then translate into "months of learning"?

I say it's time to retire the phrase "x months of learning" once and for all: it's misleading, it has no real practical applications, and it betrays a fundamental misunderstanding of the nature of learning. Instead, let's express "student achievement" in the most straightforward possible - a way that avoids statistical jargon yet clearly spells out the effects of various treatments:

Let's express "student learning" in numbers of test score items answered correctly (or incorrectly).

Because that's what we're really measuring, right? All you'd have to say is: "Students who had this treatment - charter schools, TFA teachers, carbo-loading, whatever - got x many more items correct on a test with a total of y questions than students who didn't get the treatment." That's honest, it's simple, it's not subject to wild misinterpretation, and it's easy to understand.

Of course, it also points out what we're really measuring. And that may not serve the agendas of certain folks well...
Arne, how many months of student learning is that?

* Rereading, I realize I'm glossing over this a bit. Probably should do another post, but for now: would it surprise anyone that TFAers, who were all good test takers themselves, would better understand how to pass tests? If they see that lower proficiency on more math units yields better scores than higher proficiency in fewer units, are they really providing "more learning" by adjusting their teaching to meet that objective? And is it possible this same strategy will not work in a domain that doesn't have units that are as "discrete," like language arts?

Again, this is outside of my expertise. But if you can't think out loud, what's the point of having a blog?

4 comments:

EC said...

You are exactly right. This kind of terrible sloppiness, usually in the service of an agenda, is characteristic of a lot of supposedly data-driven educational research. I discussed a couple of these issues, one of them very similar to the one you point out, in a post about John Hattie's work:

http://literacyinleafstrewn.blogspot.com/2012/12/can-we-trust-educational-research_20.html

Unknown said...
This comment has been removed by the author.
Unknown said...

Let's try this again...

http://www.quickmeme.com/meme/3ts8xx/

Unknown said...

so, what you are suggesting is that in order to measure student achievement, proven research methods should be employed in the education sector, just as proven research methods are used in any other industry. To reiterate, there are "skewed" results being broadcast as "reliable and credible," until somebody conducts a research study without having a vested interest in what the results of the study will yield? in other words, the only really reliable studies are ones that are conducted as "double blind" studies, is this what you are expressing here? and lastly, what are the controls of these, "studies?"