What do state test scores measure?

In the decades since the passage of the federal No Child Left Behind Act, much of the rhetoric around K-12 school performance and student academic achievement is centered around the percentage of students labeled as meeting or exceeding standards on state standardized tests. This percentage is often misinterpreted as the percentage of children scoring at or above grade level.

In fact, the scores required to be labeled proficient or meeting standards have been moved around many times in the last two decades. They are now closely aligned with the National Assessment of Education Progress (NAEP) categories of proficient. However, NAEP’s proficient is not the same thing as grade-level performance.

Claims that the majority of children in Illinois are not performing on grade level, even in the wake of the drop in tests scores due to the pandemic, are false.

Misinterpretation of what large-scale standardized tests actually tell us won’t lead to improvements in instructional practices or more equitable educational environments.

So, what do state test scores actually measure? What inferences can we draw from them about schools and students? Read our full explainer piece here: What do state test scores measure? (.pdf), and a shorter summary below:

In 2001, Congress passed the No Child Left Behind Act (NCLB) with bipartisan support, and annual standardized testing became a rite of spring in U.S. public schools. NCLB attached high-stakes to low test scores, including firing school staff and even closing schools.

This is particularly concerning given that research has consistently demonstrated that much of the variation in scores across groups of students can be accounted for by socioeconomic factors outside of school control. As research accumulated that making policies like grade promotion contingent on test scores harmed students, in 2015, Congress rewrote NCLB, replacing punitive measures with increased support for schools with low scores. But, the requirement for annual testing remained.

Almost another decade has passed, but mistaken ideas about what test scores tell us about students and schools persist. Virtually all standardized tests are norm-referenced, designed to produce statistically normal distributions of scores that rank test-takers relative to one another—high-level information most useful for gauging relative performance of groups of students. However, NCLB only passed in the first place because it codified a compromise: mandated tests would not merely rank students, but instead provide information about whether they had mastered specific learning standards or academic skills.

Unfortunately, large-scale, norm-referenced standardized tests simply cannot validly provide that type of fine-grained information. Testing companies and policy-makers dodged this quandary by inventing cut scores at which test takers are purported to have met learning standards. This is also problematic as cut scores are a subjective, political decision, not a scientific one. Since 2001, the minimum scores required on state tests to earn the label proficient or meeting standards have shifted many times. The resulting changes in the percentage of students “meeting state standards” do not necessarily signify actual fluctuations in student achievement and school effectiveness. For example, the percentage of Illinois’ 3rd to 8th graders who met or exceeded state standards dropped from 82% in 2012 to 59% in 2013 to 33% in 2015.

Did Illinois students suddenly achieve at much lower levels in 2013 and 2015? No, Illinois’ cut scores were adjusted upwards to align with the National Assessment of Education Progress (NAEP) category of proficient. But, NAEP’s proficient label does not mean grade-level performance. Achieving at grade level means scoring at the 50th percentile of national norms, but, for the last two decades, NAEP’s cut score for proficient has hovered between the 60-70th percentiles. In fact, there have been no dramatic shifts in test scores for Illinois students over the last decades—other than due to the pandemic, a fact easily verified by looking at Illinois’ NAEP results, which have been similar to the national average and, in some cases, a little higher.

Confusion about test scores harms both schools and students.

Anti-public education groups, like the Illinois Policy Institute and Wirepoints, purposely misrepresent the meaning of “meeting standards” as “at grade level” so they can trumpet talking points like “only 1/3 of Illinois 3rd-8th graders can read and 1/4 can do math.” They blame these fake statistics of failure on education policies and programs they oppose—from diversity, equity and inclusion initiatives to a unionized teaching workforce—and call for defunding and privatizing public education.

But even many well-meaning supporters of public schools misunderstand what test scores measure and perpetuate testing companies’ marketing claim that the output of nationally-normed tests is useful for diagnosing students’ specific skill deficits—no matter how often assessment experts point out that this just isn’t so. Based on this false belief, schools often emphasize instructional practices which drill students on one skill or standard at a time. This systematically limits students’ opportunities to engage in deep learning about complex subject matter—a major source of education equity gaps.

In other words, “teaching to the test” exacerbates the very problem it intends to solve—low test scores!

What steps can we take to limit the harm these misconceptions lead to? Share information about and advocate for better forms and systems of assessment beyond just standardized testing. Support changing national and state laws on testing to provide more equitable educational experiences for all students. And vote for candidates at every level of government who themselves understand what test scores can and can’t tell us about what students know and can do.

Are you a school board candidate? Here's a great basic primer on assessment from the site Democracy and Education, a resource for school board candidates and members on issues and policies.

More resources on standardized testing and assessment

Ambitious Teaching and Equitable Assessment: A Vision for Prioritizing Learning, Not Testing. (2021) L. Shepard. American Federation of Teachers.

Authentic Intellectual Work and Standardized Tests: Conflict or Co-existence? (2002) F. Newmann, et. al. Chicago Consortium on School Research

Classroom Assessment to Support Teaching and Learning (2019) L. Shepard. What Use Is Educational Assessment? Annals of the American Academy of Political and Social Science

Despite warnings, College Board redesigned SAT in way that may hurt neediest students (2016) R. Dudley. Reuters

How to Address Racial Bias in Standardized Testing (2020) W. Y. Choi. Next Generation Learning Challenges

Interviews - James Popham (2001) “Testing Our Schools” series. Frontline

A Pernicious Myth: Basics Before Deeper Learning (2018) J. Mehta Education Week.

A Question District Leaders Need to Ask More Often: What Parts of Formative Assessment Can’t Be Outsourced? (2022) P. Zavitkovsky. UCEL

The Racist Beginnings of Standardized Testing (2021) J. Rosales, and T. Walker. National Educational Association

Taking Stock: How Standardized Test Reports Let Us Down Under No Child Left Behind…And How We Can Fix What’s Wrong (2016) P. Zavitkovsky, et al.

It’s Time to Fix Standardized Testing (2020) A. Bello

A Tricky Balance: The Challenges And Opportunities Of Balanced Systems Of Assessment. (2018) S .Marion, et al. National Council on Measurement in Education.

What the Marketplace Has Brought Us: Item-by-Item Teaching With Little Instructional Insight. (2010) L. Shepard. Peabody Journal of Education