One of the most significant changes in educational policy of the past two decades is the movement toward test-based accountability in the schools. From the beginning, this movement has elicited strongly held opinions about its design, impact, and desirability. We now have another addition to this packed field – from a distinguished panel of experts flying under the banner of the National Research Council. Unfortunately, it is unlikely to clear up the issues. Indeed it is more likely to leave the casual reader with just the wrong impression.
The fundamental problem with the report (Incentives and Test-based Accountability in Education) is dramatically evidenced in the first sentence of the first conclusion:
Test-based incentive programs, as designed and implemented in the programs that have been carefully studied, have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.
This statement invites a sound bite such as ‘we should move away from accountability, because it has not done much.’ Indeed, the first sentence of the National Academies own press release says “Despite being used for several decades, test-based incentives have not consistently generated positive effects on student achievement.”
When you parse the conclusion, you see that it refers to programs: (1) as designed; (2) as implemented; (3) that have been carefully studied; and – the topper – (4) that have not by themselves closed the achievement gaps with the highest performing countries. Digging into the report, one finds a nice research review suggesting that conditions (1)-(3) are based on a fairly thin evidentiary base but one that generally suggests positive impacts of accountability.
The biggest problem with the conclusion (and the report) lies in the casual “compared to what” measure that is adopted. Why would we discard an effective program just because it falls short of our hopes of producing the world’s best education? It is generally inappropriate to use words like “silly” in academic discussions, but . . .
Nowhere does the report indicate an alternative educational program that leads to as large an improvement in overall U.S. achievement as accountability. Nowhere does the report suggest any single program or package of reforms that would close the achievement gap with the highest performing countries. Nowhere does the report really make the case that alternative reform packages should not include an accountability component.
Let’s quickly put the report’s overall judgment into perspective. The report speaks quite dismissively of the estimates of achievement gains of 0.08 standard deviations. Ludger Woessmann and I have analyzed how achievement relates to national economic growth. If the future follows the patterns we have seen historically, the present value of achievement gains of this magnitude would be over $13 trillion. Given that our GDP is currently $15 trillion, I personally think such gains are worth considering.
The second major shortcoming of the report follows from their nice review of incentive and testing issues. Most scholars would agree that it is insufficient simply to trot out the well-known list of potential problems with accountability and incentive schemes. Many have done this before. Moreover, others have taken the list and the available evidence as the basis for framing how the existing but imperfect accountability schemes could be modified in order to improve on the first generation of plans. Unfortunately, that was not the focus or interest of this panel. They make virtually no effort to provide their professional opinions about how the incentives of accountability systems could be strengthened or how the testing of achievement could be improved. It seems like Shakespeare all over: “I come to bury Caesar, not to praise him.”
The remarkable conclusion to be drawn from the evidence presented in the report is how much can be gained from a flawed accountability system – again, think trillions of dollars. Imagine what might be possible if we improved the system along the lines that many others have described (and that can be inferred from the analysis buried within this report).