But how we will make both work together? Lucky for you, America, the great state of Tennessee is leading the way on teacher evaluation:
In Murfreesboro City Schools, principals rated nearly half the teachers a five — the best score possible on the state’s new evaluation.
But in Fayette County Schools in far West Tennessee, only 1 percent garnered that rating.
The first glimpse of how educators fared under the system, which ultimately will affect whether they earn and keep tenure, demonstrated how subjective the process can be. The Tennessee Department of Education released principal observation data in December after The Tennessean and Williamson County Schools Director Mike Looney filed separate open records requests for it.
Looney said he wanted the data for comparison after Williamson principals rated 97 percent of teachers a three or higher, and state education officials questioned those ratings. He said his county has a high level of teacher talent plus motivated students, and state officials shouldn’t pressure districts to align scores with projections.
“To come to some conclusion that our scores are too high ... is preposterous,” Looney said. “We are not going to feel compelled or pushed into making our teachers fit some bell curve.”Oh, Mr. Looney, you poor, deluded man. Don't you understand? Everyone knows that there must be a certain percentage of teachers who suck. Rich people say it's so; who are you to argue? We already do it with the kids; read Todd Farley. He tells stories of adjusting standardized test grades to make them fit pre-determined distributions drawn up by psychometricians. If we're doing it for the kids, why shouldn't the teachers also be subject to the statisticians' whims?
Of course, this logic requires that we all accept certain... contortions:
Tennessee and 16 other states redesigned teacher evaluation models in the past two years, tying ratings to student test scores, according to the National Council on Teacher Quality. Tennessee, further along than most, both designed and piloted its new system in the 2010-11 school year and put it into effect for all districts this school year.
Under the new system, 35 percent of the final score is on student learning gains and 15 percent on data the school chooses, such as ACT scores. Principals use a long list of measures for success to do their observations, which count for the other half.
The state predicted that districts would rate 3-5 percent of teachers as ones; 10-25 percent as twos; 40-50 percent as threes; 10-25 percent as fours and 5-10 percent as fives. No district that submitted data hit all those ranges.
The projections were based on value-added scores — which measure how much students learned in a year — how other districts using the same observation form distributed scores, and research from the National Institute for Effective Teaching, said Emily Barton, the state’s assistant commissioner for curriculum and instruction. [emphasis mine]Oh dear - you mean the principal observations didn't match up with the value-added scores (the scores based on standardized test results)?
Wait - I thought the entire point of the principal evaluations was that they would mitigate against unreliable value-added scores. We all know the tests are insanely unreliable as measures of teacher effectiveness, but I thought we teachers didn't need to worry about that. These (usually irrelevant) test scores were only supposed to be a part of our evaluations: only 50%, or 30%, or even 20% of your final rating.
So it should be a good thing that the test scores don't match up with the principal evaluations. We should be happy about this - it means the system is working!
At the end of the school year, districts with observation scores that don’t closely align with value-added growth scores could penalize principals by taking 10 percent off their own evaluations, Barton said. For now, the state focus is that educators continue to get used to the new evaluations and that teachers get constructive feedback.Let's recap:
- Test scores are admittedly unreliable measures of teacher quality.
- So we need principal observations to mitigate against their unreliability.
- But we're going to check principal observations against test scores.
- And if there is a discrepancy, we'll punish the principal.