Testing Teachers — American RadioWorks — Are test scores the right measuring stick for teachers?

Are test scores the right measuring stick for teachers?

Education researchers gauge the quality of an individual teacher by looking at student test scores. If scores go up in a teacher's classroom, that's a sign the teacher is doing a good job. Education reformers are pushing schools to use test score growth as part of teacher performance evaluations. But experts are urging caution.

The Obama administration wants schools to do a better job identifying which teachers are effective in the classroom and which ones are not. The administration is offering billions of dollars through its Race to the Top initiative to get school systems to adopt new evaluation systems. To be eligible for the money, states cannot have laws that prevent teacher evaluation from being tied to student test scores. Teachers' unions fought hard for those laws; they have long opposed using test results to evaluate teachers. But some state legislatures are scrapping their bans to get federal money.

Many teachers and union leaders are upset. They say the American education system is too focused on test scores already, and measuring teachers by test scores will make the problem worse. "Using test scores to measure teacher effectiveness fosters a tendency to focus not on learning but on improving test scores," says Randi Weingarten, president of the American Federation of Teachers, one of the nation's most powerful teacher's unions. Weingarten says teaching is too complex to be measured by a test score alone.

On the other side of the argument are education leaders like Michelle Rhee, the public schools chancellor in Washington, D.C. "In order to have the privilege of teaching kids you have to be able to show that you can significantly move their academic achievement levels," she says. "And if you can't show that, then you need to find another profession." Rhee launched a controversial evaluation system in Washington where test score growth counts as 50 percent of a teacher's annual performance score.

The concept is simple: take each student's test score at the beginning of the school year, compare that to their score at the end of the year, and the amount the score has grown is what the teacher added to the student's knowledge that year. Experts call it a "value-added" score. While the concept is simple, coming up with this score is quite complex. Even the academics who invented the concept say it may not make sense to judge individual teachers this way.

"How you actually use these scores is a complicated issue," says Stanford economist Eric Hanushek, one of the pioneering scholars in the field. It's one thing for researchers to draw conclusions and identify policy issues using value-added scores. He says it's another to tell teachers that their jobs and salaries depend on them. Hanushek shares Randi Weingarten's concern that teaching could become too test driven.

"Test scores are not the whole range of things that we care about in terms of student development," he says. And the standardized tests aren't that good right now. They're too easy, Hanushek says, and they may not be measuring what students really need to know.

There are also technical concerns. Researchers are particularly worried about sampling error. Average class size in the United States is between 16 and 24 students. Using such a small sample to assess the quality of a teacher each year leaves a lot of room for error. A study of teachers in San Diego found that 13 percent of the teachers who had the smallest increase in test scores one year ended up having the highest test score gains the next year.

"And that's just inexplicable," Harvard University researcher Heather Hill says. "We know that teachers change a little bit with regard to who's in their classroom and the type of rapport that they have with kids. But to go from the bottom to the top is just strange."

Hill says one way to account for such a dramatic shift is sampling error. These errors can be resolved by using several years' worth of test data to judge a teacher. But some school districts — including Washington, D.C. — use just one year's results.

When test scores shift dramatically, there may also be more disturbing explanations. Some teachers could be focusing excessively on "teaching to the test," while neglecting material that won't boost student scores. And some teachers might be cheating.

A study using data from Chicago Public Schools found unusual patterns in test scores that researchers decided could only result from cheating. Based on this evidence, they estimated that teachers or administrators cheat on standardized tests in at least five percent of elementary school classrooms.

"While the vast majority of teachers do not cheat," says Douglas Harris, author of a forthcoming book about testing. "This is such a harmful response that even a small amount of it is highly problematic." Harris and other experts say the more consequences attached to test scores, the more incentives schools and teachers have to game the system.

Another issue is that student learning is typically not the result of just one teacher's efforts. "Debates about education tend to assume, falsely, that teachers work alone with their students," Harris says. "But there are many ways in which teachers are woven together into teams and systems." Think of this common example: a student is struggling in math, so he's pulled out of his regular class two days a week and put in a remediation course. Who should get the credit — or the blame — for his test score at the end of the year, the regular teacher or the one who taught the remediation course?

Researchers are also concerned about how school principals assign students to teachers. Evidence suggests some teachers are routinely given the students with the most learning or behavior problems. This has an impact on a teacher's ability to raise test scores.

An additional problem with generating value-added scores is they depend on having two test scores for each student. One score represents what the student knew coming into a teacher's classroom, and a later score shows what the student learned in the class. It's actually harder than it might appear to generate enough test data to evaluate all teachers this way. Only some grades and subjects are tested. Students move from school to school, where tests might be different, making comparisons impossible. In Washington, D.C., the school system possessed enough test data to generate value-added scores for only 13 percent of its teachers. In response, school officials are expanding the testing program.

The main issue is that student scores reveal much less about teacher effectiveness than many people would like to believe says researcher Heather Hill. A big gain in scores can indicate a teacher is doing a great job. But from there the picture gets fuzzy. Hill says there are various lessons one can draw from test scores, with varying degrees of reliability.

Great teachers raise their students' test scores. In other words (barring measurement error) you can't be a great teacher unless you raise test scores.
Teachers who fail to raise scores are not very good teachers.
But, you can be a poor teacher and still raise scores through a lot of review and test preparation.
Between the great and the mediocre, there are a lot of average teachers. They are the majority and test scores don't reveal much about them.

Hill says schools need to look beyond test scores to decide which teachers should be fired, and which teachers need help getting better. "If I were being evaluated on these systems," Hill says, "I would think this is just a sham."

Hill and other researchers say the value-added approach may be a good way to evaluate teachers in the future, but statisticians and school officials still have a ways to go to make these systems reliable.

Michelle Rhee