Standards, accountability, assessment, and curriculum

Gary T. Henry, Shelby M. McNeill, Erica Harbatkin.

Test-based accountability pressures have been shown to result in transferring less effective teachers into untested early grades and more effective teachers to tested grades. In this paper, we evaluate whether a state initiative to turnaround its lowest performing schools reproduced a similar pattern of assigning teachers and unintended, negative effects on the outcomes of younger students in untested grades. Using a sharp regression discontinuity design, we find consistent evidence of increased chronic absenteeism and grade retention in the first year. Also, the findings suggest negative effects on early literacy and reading comprehension in the first year of the reform that rebounded somewhat in the second year. Schools labeled low performing reassigned low effectiveness teachers from tested grades into untested early grades, though these assignment practices were no more prevalent in reform than control schools. Our results suggest that accountability-driven school reform can yield negative consequences for younger students that may undermine the success and sustainability of school turnaround efforts.

Sarah Guthery, Lauren P. Bailes.

This study investigates the influence of principal tenure on the retention rates of the teachers they hire over time. We analyzed the hiring practices and teacher retention rates of 11,717 Texas principals from 1999 to 2017 employing both individual and year fixed effects. Main findings indicate that a principal who stays in the same school for at least three years begins to hire teachers who stay to both three- and five-year benchmarks at increasingly higher rates. However, the average Texas principal leaves a school after four years and while we do find small positive gains in the initial retention rates of teachers at the next school, the majority of principal improvement in teacher retention does not appear to be portable.

Dylan Conger, Mark C. Long, Raymond McGhee Jr..

To evaluate how Advanced Placement courses affect college-going, we randomly assigned the offer of enrollment into an AP science course to over 1,800 students in 23 schools that had not previously offered the course. We find no substantial AP course effects on students’ plans to enroll in college or on their college entrance exam scores. Yet AP course-takers enroll in less selective colleges than their control group counterparts. Negative treatment effects on college selectivity appear to be driven more by low student preparation than teacher inexperience and by students’ matriculation decisions rather than institutional admissions decisions. 

Heather C. Hill, Erica Litke, Kathleen Lynch.

For nearly three decades, policy-makers and researchers in the United States have promoted more intellectually rigorous standards for mathematics teaching and learning. Yet, to date, we have limited descriptive evidence on the extent to which reform-oriented instruction has been enacted at scale.

The purpose of the study is to examine the prevalence of reform-aligned mathematics instructional practices in five U.S. school districts. We also seek to describe the range of instruction students experience by presenting case studies of teachers at high, medium and low levels of reform alignment.

We draw on 1,735 video-recorded lessons from 329 elementary teachers in these five U.S. urban districts.

Research Design:
We present descriptive analyses of lesson scores on a mathematics-focused classroom observation instrument. We also draw upon interviews with district personnel, rater-written lesson summaries, and lesson video in order to develop case studies of instructional practice.

We find that teachers in our sample do use reform-aligned instructional practices, but that they do so within the confines of traditional lesson formats. We also find that the implementation of these instructional practices varies in quality. Furthermore, the prevalence and strength of these practices corresponds to the coherence of district efforts at instructional reform.

Our findings suggest that unlike other studies in which reform-oriented instruction rarely occurred (e.g. Kane & Staiger, 2012), reform practices do appear to some degree in study classrooms. In addition, our analyses suggest that implementation of these reform practices corresponds to the strength and coherence of district efforts to change instruction.

Samantha Viano, Gary T. Henry.

Credit recovery (CR) refers to online courses that high school students take after previously failing the course. Many have suggested that CR courses are helping students to graduate from high school without corresponding increases in academic skills. This study analyzes administrative data from the state of North Carolina to evaluate these claims using full data from public and private CR providers. Findings indicate that students who fail courses and enroll in CR have lower test scores of up to two tenths of a standard deviation and are about seven percent more likely to graduate high school on time than students who repeat courses traditionally. Test score differences are particularly large for Biology compared to Math I and English II. Hispanic and economically disadvantaged CR students are more likely to graduate high school than their peers.

Emma M. Klugman, Andrew D. Ho.

State testing programs regularly release previously administered test items to the public. We provide an open-source recipe for state, district, and school assessment coordinators to combine these items flexibly to produce scores linked to established state score scales. These would enable estimation of student score distributions and achievement levels. We discuss how educators can use resulting scores to estimate achievement distributions at the classroom and school level. We emphasize that any use of such tests should be tertiary, with no stakes for students, educators, and schools, particularly in the context of a crisis like the COVID-19 pandemic. These tests and their results should also be lower in priority than assessments of physical, mental, and social–emotional health, and lower in priority than classroom and district assessments that may already be in place. We encourage state testing programs to release all the ingredients for this recipe to support low-stakes, aggregate-level assessments. This is particularly urgent during a crisis where scores may be declining and gaps increasing at unknown rates.

Mark Murphy, Angela Johnson.

This study examines the effects of English Learner (EL) status on subsequent Special Education (SPED) placement. Through a research-practice partnership, we link student demographic data and initial English proficiency assessment data across seven cohorts of test takers and observe EL and SPED programmatic participation for these students over seven years. Our regression discontinuity estimates consistently differ substantively from results generated through regression analyses. We find evidence that the effect of EL status on SPED placement was either null or tied to slight under-identification. Our results suggest that under-identification occurred two years after EL classification. We also find that EL status led to under-identification for Spanish speakers and proportionate representation for Mandarin/Cantonese speakers and speakers of all other languages.

Christian Buerger, Seung Hyeong Lee, John D. Singleton.

A recent literature provides new evidence that school resources are important for student outcomes. In this paper, we show that school finance reform-induced increases in student performance are driven by those states that had test-based accountability policies in place at the time. By incentivizing school improvement, accountability systems (such as the federal No Child Left Behind act) may raise the efficiency with which additional school funding gets spent. Our empirical approach leverages the timing of school finance reforms to compare funding impacts on student test scores between states that had accountability in place at the time of the reform with states that did not. The results indicate that finance reforms are three times more productive in low-income school districts when also accompanied by test-based accountability. These findings shed new light on the role of accountability incentives in education production and the mechanisms supporting the effectiveness of school resources.

Beth E. Schueler, Catherine Armstrong Asher, Katherine E. Larned, Sarah Mehrotra, Cynthia Pollard.

The public narrative surrounding efforts to improve low-performing K-12 schools in the U.S. has been notably gloomy. Observers argue that either nothing works or we don’t know what works. At the same time, the federal government is asking localities to implement evidence-based interventions. But what is known empirically about whether school improvement works, how long it takes, which policies are most effective, and which contexts respond best to intervention? We meta-analyze 141 estimates from 67 studies of turnaround policies implemented post-NCLB. On average, these policies have had a moderate positive effect on math but no effect on ELA achievement as measured by high-stakes exams. We find evidence of positive impacts on low-stakes exams in STEM and humanities subjects and no evidence of harm on non-test outcomes. Some elements of reform, namely extended learning time and teacher replacements, predict greater effects. Contexts serving majority-Latinx populations have seen the largest improvements.

David D. Liebowitz.

Teacher evaluation policies seek to improve student outcomes by increasing the effort and skill levels of current and future teachers. Current policy and most prior research treats teacher evaluation as balancing two aims: accountability and skill development. Proper teacher evaluation design has been understood as successfully weighting the accountability and professional growth dimensions of policy and practice. I develop a model of teacher effectiveness that incorporates improvement from evaluation and detail conditions which determine the effectiveness of teacher evaluation for growth and accountability at improving student outcomes. Drawing on empirical evidence from the personnel economics, economics of education and measurement literatures, I simulate the long-term effects of a set of teacher evaluation policies. I find that those that treat evaluation for accountability and evaluation for growth as substitutes outperform policies that treat them as complements. I conclude that optimal teacher evaluation policies would impose accountability on teachers performing below a defined level and above which teachers would be subject to no accountability pressure but would receive intensive instructional supports.

