I ask because their disastrous teacher evaluation proposals, announced with great fanfare last week, betray an embarrassing misunderstanding of the fundamentals of mathematics. It will take a few posts to catalog them all, but let's start with this:
A large portion of a "tested" teacher's evaluation will now include a metric called a "Median Student Growth Percentile," or mSGP. In a previous post, I showed how SGPs are woefully inappropriate for use in teacher evaluation, because they are purely descriptive measures: they do not measure how a teacher contributes to student learning.
But even if we put aside the problems of SGPs, there are still obvious problems with mSGPs; obvious, that is, to anyone with a basic understanding of the difference between a median and a mean.
Here's how NJDOE describes the use of mSGPs, from the proposed regulations they released last week:
Proposed N.J.A.C. 6A:10-4.2(b) describes which teachers will receive a median student growth percentile, and proposed N.J.A.C. 6A:10-4.2(c) explains how the score will be calculated by the Department. The Department will provide a list of all courses that fall within a standardized-tested grade or subject for the purpose of student growth percentile. For instance, a third grade math class would not be used for student growth percentile because second graders do not take the standardized assessment under the current NJASK schedule and therefore growth from one year to the next cannot be measured. Additionally, teachers must teach a course for at least 60 percent of the time between the start of the year and the time the standardized test is administered and they must have at least 20 students attributed to his or her name through the school district’s course roster data system. If a teacher does not have at least 20 individual student growth percentile scores in a given academic year, up to three years of student data must be used to reach the minimum requirement of 20 students.
How would this work? Let's imagine a 4th Grade class of 21 students taught by Ms. Jones. Each of her students is assigned an SGP from 1 to 99 based on the math section of the NJASK-4. Here are their SGPs, in ranked order:Proposed N.J.A.C. 6A:10-4.2(c) explains that the Department will calculate the student growth percentile by finding the median of all students who were enrolled in a course or a group within a course. [emphasis mine]
40 40 41 42 46 47 48 49 49 50 50 59 68 78 88 91 92 92 93 95 97
Average: 65
Next door to Ms. Jones is Ms. Smith, who also teaches 21 Fourth Graders. Here are her students' SGPs, again in rank order:
Ms. Smith's Class
2 5 11 14 17 22 23 26 37 44 50 51 52 53 54 55 55 59 59 61 62
Next door to Ms. Jones is Ms. Smith, who also teaches 21 Fourth Graders. Here are her students' SGPs, again in rank order:
Ms. Smith's Class
2 5 11 14 17 22 23 26 37 44 50 51 52 53 54 55 55 59 59 61 62
Average: 39*
Again, let's stay away from the question of whether Ms. Jones is a "better" teacher than Ms. Smith, and instead look at each class's "growth." Which one of these classes "grew" more? Most of us would say Ms. Jones's class did...
But not the NJDOE.
See, our reformy overlords are not interested in judging theclasses teachers by their average SGPs, or the mean. They want to judge the classes teachers by the median: the SGP that the middle child in the rank order received. What would that be?
Ms. Jones's Class
40 40 41 42 46 47 48 49 49 50 50 59 68 78 88 91 92 92 93 95 97
I highlighted the median score for each class; they are the same. Even though Ms. Jones's class had much greater average or mean growth than Ms. Smith's, they had the same median growth.
Let's graph it:
Ms. Jones class is blue; Ms. Smith's is red. Notice that the lower half of Jones's class is clustered just below 50, while the upper half of Smith's class is clustered just above 50.
Is this a typical example? I don't know - and neither does the NJDOE. We haven't tested this system, so we have no idea what we're going to find when it's rolled out next year. That makes it even more important that the NJDOE not lock school districts into one interpretation of test scores. At the very least, we should look at the median, the mean, and the standard deviation of SGPs for each tested teacher.
But that would deny the NJDOE their opportunity to take personnel decisions out of the hands of local administrators and put them into an untested, mathematically illiterate system. Which seems to be the entire point...
More math fails coming up.
*NOTE - Dear NJDOE: Did you notice how when I averaged the scores, I expressed the average in the correct number of significant digits. Kind of important, don't you think?
Stand by...
Again, let's stay away from the question of whether Ms. Jones is a "better" teacher than Ms. Smith, and instead look at each class's "growth." Which one of these classes "grew" more? Most of us would say Ms. Jones's class did...
But not the NJDOE.
See, our reformy overlords are not interested in judging the
Ms. Jones's Class
40 40 41 42 46 47 48 49 49 50 50 59 68 78 88 91 92 92 93 95 97
Ms. Smith's Class
2 5 11 14 17 22 23 26 37 44 50 51 52 53 54 55 55 59 59 61 62I highlighted the median score for each class; they are the same. Even though Ms. Jones's class had much greater average or mean growth than Ms. Smith's, they had the same median growth.
Let's graph it:
Ms. Jones class is blue; Ms. Smith's is red. Notice that the lower half of Jones's class is clustered just below 50, while the upper half of Smith's class is clustered just above 50.
Is this a typical example? I don't know - and neither does the NJDOE. We haven't tested this system, so we have no idea what we're going to find when it's rolled out next year. That makes it even more important that the NJDOE not lock school districts into one interpretation of test scores. At the very least, we should look at the median, the mean, and the standard deviation of SGPs for each tested teacher.
But that would deny the NJDOE their opportunity to take personnel decisions out of the hands of local administrators and put them into an untested, mathematically illiterate system. Which seems to be the entire point...
More math fails coming up.
*NOTE - Dear NJDOE: Did you notice how when I averaged the scores, I expressed the average in the correct number of significant digits. Kind of important, don't you think?
Stand by...
3 comments:
Jazzman,
Very impressive. Methinks your reference to significant digits will leave Bari, Pete, Chris,& Tim googling for hours. Tracy may actually be able to explain it though.
Also, by using a median score a moral hazard is created. Actually the whole system is hazardous, BUT, teachers will always know that the very lowest growth performers in their classes will never affect the mSGP of their teachers. And neither will the highest performing Growth student scores affect the teachers mSGP!
To use the words of my long dead grandfather, Cerf and crew are educated beyond their own intellectual abilities. And Elii could only teach them so much in ten weekends.
I have little faith in the accuracy or objectivity of any teacher evaluation. However, looking back on my second 'procedure', a new trend did emerge. That is, the administrators HAD TO listen to me in the pre and post conferences! The result has been the documentation of the ISSUES affecting my ability to soar with the eagles where I don't have a classroom, I can't get to 3rd floor from 1st floor in zero minutes, I don't have access to computers, printers or copy machines that work, and yes, it does matter that 6th graders did not have music in 3rd or 4th grade. All documented on the new instrument. I am loving this part. Also, the persistent follow-ups of the RACs. Annoying yes, but occasionally they get something right.
I'm happy to find numerous useful info here in the post. I would really like to come back again right here for likewise good articles or blog posts. Thanks for sharing...
Rajiv Gandhi University distance education
Post a Comment