Jersey Jazzman: Teacher Evaluation and the Illusion of Precision

Wednesday, July 9, 2014

Teacher Evaluation and the Illusion of Precision

I testified before the NJ State Board of Education today about our year-old teacher evaluation system, AchieveNJ -- aka Operation Hindenburg. Time was limited to five minutes, so I focused on something I hope to be exploring more fully this month: the illusion of precision found in teacher evaluations.

For now, here's today's testimony.

* * *

AchieveNJ, as currently constituted, is fundamentally flawed. It is critical for this board and the Department of Education to understand that AchieveNJ violates the most basic laws of measurement and statistical practice; consequently, it is simply not viable.

AchieveNJ consists of three basic parts: a score based on teacher practice, a score based on Student Growth Objectives (SGOs), and, for teachers in tested grades and areas, a score based on Student Growth Percentiles (SGPs). These scores are weighted and combined to create a summative rating, which determines the final effectiveness rating of a teacher.

No doubt you are aware of the work of Dr. Bruce Baker, who has shown that the SGPs have inherent biases at the school level against schools with larger proportions of at-risk students.[1] If these measures are biased at the school level, there is every reason to believe they are biased at the teacher level as well. No teacher should be punished in his or her ratings simply because they choose to work with the neediest students.

You may also aware that there is no evidence that SGOs are either valid or reliable, particularly in the many untested subjects in which they were used this past year.[2] The NJDOE has released research about SGOs on its website; my review of that literature, however, confirms that we have next to no evidence of any predictive validity for SGOs that would read us to conclude that they are viable measures of student achievement, let alone teacher effectiveness.

These are serious concerns. But as my time is limited, I’d like to focus on one particular part of AchieveNJ that has received little attention, yet is probably its greatest flaw: the illusion of precision.

In both the creation of scores for each of its three components, and in the combination of those scores to create a summative rating, AchieveNJ creates scores that are more precise than they should be. Even though it is impossible under the laws of mathematics to create finely-grained scores for teacher evaluations, AchieveNJ violates those laws and does just that; consequently, the ratings teachers receive under this system are ultimately arbitrary and capricious.

To understand this phenomenon, it is necessary to understand the idea of “significant figures.” As any high school student will tell you, you can’t average measures without rounding up or down to the nearest significant figure. The reason is that a measure with many digits implies that the measuring instrument can make distinctions as fine as the resulting average.

Unfortunately, AchieveNJ’s teacher practice scores violate this most basic of mathematical concepts. Take, for example, the Danielson instrument: in every component of this model, teachers receive a score of 1, 2, 3 or 4. Yet in the Teachscape system used by many districts, and in materials published by the NJDOE, the scores of each Danielson component are averaged to numbers in between these integers.

In an example on the NJDOE’s website[3], a hypothetical teacher is given a rating of 3.15 in their teacher practice score. This is, to put it bluntly, innumerate. Any student in a high school science class who averaged this way would fail. I know this because, ironically, the Common Core State Standards in Mathematics specifically require students to demonstrate the ability to “Choose a level of accuracy appropriate to limitations on measurement when reporting quantities.”[4] (CCSS.MATH.CONTENT.HSN.Q.A.3)

Even Charlotte Danielson herself would tell you her instrument is incapable of distinguishing between a teacher who gets a score of 3.15 and a teacher who gets a score of 3.25. AchieveNJ is perpetuating an illusion of precision.

This is critically important because the cut scores set by the NJDOE are based on this illusion. AchieveNJ actually pretends that the teacher practice instruments can distinguish between a teacher who gets either below or above a 2.65, the cut score that determines “effectiveness.” Not only is there absolutely no research base to support this arbitrary cut score: the cut score itself is in violation of a mathematical concept we expect our high school students to comprehend.

Why did NJDOE decide to perpetuate this illusion? I can only guess, but I suspect it is because they had a problem with combining SGPs, which are on a 1-to-99 scale, with teacher practice scores, which should be on a 1-to-4 integer scale. But adding phony precision to teacher observations is not a solution to this problem.

An evaluation system that ignores the basics of measurement cannot and should not be trusted. I am afraid, looking at both the membership of and the list of witnesses brought before the Educator Effectiveness Task Force, this basic concept slipped the grasp of those who were charged with developing AchieveNJ.

I would urge this board to avail itself of the many excellent scholars and researchers in New Jersey who have expertise in this area, and give them extended time to explain to you these and the many other many flaws found in AchieveNJ. And I would urge you to put a moratorium on any high-stakes decision based on AchieveNJ until such time as its many problems can be corrected.

Thank you for your time.

[1] http://schoolfinance101.wordpress.com/2014/01/31/an-update-on-new-jerseys-sgps-year-2-still-not-valid/

[2] http://jerseyjazzman.blogspot.com/2013/10/another-reformy-practice-not-grounded.html

[3]http://www.state.nj.us/education/AchieveNJ/resources/TeacherEvaluationScoringGuide.pdf

[4] http://www.corestandards.org/Math/Content/HSN/Q/

AchieveNJ, aka: Operation Hindenburg

1 comment:

cookie said...: SADLY THESE FLAWED EVALUATION AVERAGES ARE PLACING STIGMAS ON GREAT TEACHERS. THE ENTIRE PROCESS HAS WASTED MORE NEEDED MONEY THAT SHOULD BE USED FOR THE NEEDIEST SCHOOLS AND STUDENTS. EVERY YEAR PROGRAMS GET TESTED BECAUSE THE STATE IS LOOKING FOR SOME WAY TO EVALUATE AND IMPROVE EDUCATIONAL OUTCOMES. I REMEMBER TOSSING OUT WASTED PAGES OF RESEARCH AFTER SEVERAL CONFERENCES AS HOTELS TO DETERMINE A BEST TEACHING METHOD FOR EACH SCHOOL, ONLY TO REALIZE THAT THIS COMING TOGETHER WASTED SO MUCH TIME AND ENERGY BECAUSE NONE OF IT WAS GIVEN A FAIR CHANCE TO WORK. THE STATE CONTINUES TO CHANGE POLICY WITH EACH NEW GOVERNING BODY. THIS PROCESS GOES ON AND ON AND ALWAYS HAS FLAWS WITH EACH CHANGE OF GOVERNING BODIES. GREAT TEACHERS ARE LET GO BECAUSE THEIR EQUATIONS FOR EVALUATIONS ARE FLAWED AND THE STUDENTS ARE NOT REALLY THE MAIN OBJECT OF CONCERN. IT WILL ALWAYS BE ABOUT MORE MONEY FOR CORPORATIONS AND POLITICIANS AND THEIR CRONIES. IF YOU PAID STUDENTS TO LEARN IN THE POOR DISTRICTS I THINK THIS WOULD BE THE BEGINNING OF CHANGE. THE MAIN CAUSE IS POVERTY NOT THE TEACHERS.; July 10, 2014 at 7:12:00 AM PDT