Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

K-12 Education

Paiheng Xu, Jing Liu, Nathan Jones, Julie Cohen, Wei Ai.

Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focuses on low-inference instructional practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

More →


Matthew A. Kraft, Melissa Arnold Lyon.

We examine the state of the U.S. K-12 teaching profession over the last half century by compiling nationally representative time-series data on four interrelated constructs: occupational prestige, interest among students, the number of individuals preparing for entry, and on-the-job satisfaction. We find a consistent and dynamic pattern across every measure: a rapid decline in the 1970s, a swift rise in the 1980s extending into the mid 1990s, relative stability, and then a sustained decline beginning around 2010. The current state of the teaching profession is at or near its lowest levels in 50 years. We identify and explore a range of hypotheses that might explain these historical patterns including economic and sociopolitical factors, education policies, and school environments.

More →


Ellen Sahlström, Mikko Silliman.

We study the extent and consequences of biases against immigrants exhibited by high school teachers in Finland. Compared to native students, immigrant students receive 0.06 standard deviation units lower scores from teachers than from blind graders. This effect is almost entirely driven by grading penalties incurred by high-performing immigrant students and is largest in subjects where teachers have more discretion in grading. While teacher-assigned grades on the matriculation exam are not used for tertiary enrollment decisions, we show that immigrant students who attend schools with biased teachers are less likely to continue to higher education.

More →


Douglas D. Ready, Sierra G. McCormick, Rebecca J. Shmoys.

This paper describes a 12-week cluster randomized controlled trial that examined the efficacy of BookNook, a virtual tutoring platform focused on reading. Cohorts of first- through fourth-grade students attending six Rocketship public charter schools in Northern California were randomly assigned within grades to receive BookNook. Intent-to-Treat models indicate that students in cohorts assigned to BookNook outperformed their control-group peers by roughly 0.05 SDs. Given the substantial variability in usage rates among students enrolled in BookNook cohorts, we also leveraged Treatment-on-the-Treated approaches. These models suggest that students who completed 10 or more BookNook sessions experienced a reading advantage of 0.08 SDs, while those who completed 20 or more sessions—the recommended dosage—experienced a 0.26 SD developmental advantage.

More →


NaYoung Hwang.

This study examines the impact of special education on academic and behavioral outcomes for students with learning disabilities (LD) by using statewide Indiana data covering kindergarten through eighth grade. The results from student fixed effects models show that special education services improve achievement in math and English Language Arts, but they also increase suspensions and absences for students with LD. These effects vary across student subgroups, including gender, race/ethnicity, eligibility for free or reduced-price lunch, and English language learner status. The findings reveal both the significant benefits and unintended consequences of special education services for students with LD, highlighting the complex dynamics and varying effects of special education.

More →


Jesper Eriksen, Shaun M. Dougherty.

Vocational Education and Training (VET) programs are prevalent in a European context, but often struggle with drop-out rates that exceed those of general upper-secondary education. Using Danish administrative data, we study the effects of reform-induced reductions in shares of VET students who did not pass their lower secondary final exams on passing GPA VET students. We find that passing students have a higher probability of remaining enrolled in VET after the first year of studies when entering a VET school with a higher share of below-passing peers. Studying outside options, we find that students become less likely to drop out of education entirely. The results are consistent with models of peer effects in which particularly unmotivated students become points of comparison for their peers, increasing their motivation and likelihood of remaining enrolled.

More →


Arielle Boguslav.

Despite the common title of “coach,” definitions of high-quality coaching vary tremendously across models and programs. Yet, few studies make comparisons across different models to understand what is most helpful, for whom, and under what circumstances. As a result, practitioners are left with many options and little evidence-based direction. This is exacerbated by the literature’s focus on more abstract features of coaching practice (e.g. building trust), leaving practitioners to figure out what concrete discourse strategies support these goals. This paper begins to address these challenges by introducing a taxonomy of coaching “moves,” parsing the concrete details of coach discourse. While the taxonomy is informed by the literature, it highlights conceptual possibilities rather than providing a list of empirically-grounded or “evidence-based” strategies. In doing so, this taxonomy may serve as a common language to guide future work exploring how coach discourse shapes teacher development, synthesizing across studies, and supporting coach practice.

More →


Olivia L. Chi, Andrew Bacher-Hicks, Ariel Tichnor-Wagner, Sidrah Baloch.

Much recent debate among policymakers and policy advocates focuses on whether states should reduce teacher licensure requirements to ease the burdens of recruiting high quality teachers to the workforce. We examine the effectiveness of individuals who entered the teacher workforce in Massachusetts during the pandemic by obtaining an emergency license, which requires only a bachelor’s degree. Our results show that, in 2021-22, newly hired emergency licensed teachers: 1) were largely rated as proficient (82%) in their performance evaluation ratings and 2) had similar measures of student test score growth as their traditionally licensed peers. However, we find suggestive evidence that emergency licensed teachers with no prior employment in Massachusetts public schools and no prior engagement with the teacher pipeline (i.e., enrollment in teacher preparation, attempting licensure exams) received lower performance ratings and had lower measures of student test score growth in English Language Arts. Taken together, these results encourage the creation of additional flexibility in licensure requirements for those who have demonstrated prior efforts to join the educator pipeline.

More →


Melissa Arnold Lyon, Joshua Bleiberg, Beth E. Schueler.

State takeover of school districts—a form of political centralization that shifts decision-making power from locally elected leaders to the state—has increased in recent years, often with the purported goal of improving district financial condition. Takeover has affected millions of students throughout the U.S. since the first takeover in 1988 and is most common in larger districts and communities serving large shares of low-income students and students of color. While previous research finds takeovers do not benefit student academic achievement on average, we investigate whether takeovers achieve their goal of improving financial outcomes. Using an event study approach, we find takeovers from 1990 to 2019 increased annual school spending by roughly $2,000 per pupil after five years, on average, leading to improvements in financial condition. Increased funding came primarily from state sources and funded districts’ legacy costs. However, takeover did not affect spending for districts with majority-Black student populations—which are disproportionately targeted for takeover—adding to a growing literature suggesting that takeover unequally affects majority-Black communities.

More →

, ,
Daniel Rodriguez-Segura, Savannah Tierney.

While learning outcomes in low- and middle-income countries are generally at low levels, the degree to which students and schools more broadly within education systems lag behind grade-level proficiency can vary significantly. A substantial portion of existing literature advocates for aligning curricula closer to the proficiency level of the “median child” within each system. Yet, amidst considerable between-school heterogeneity in learning outcomes, choosing a single instructional level for the entire system may still leave behind those students in schools far from this level. Hence, establishing system-wide curriculum expectations in the presence of significant between-school heterogeneity poses a significant challenge for policymakers — especially as the issue of between-school heterogeneity has been relatively unexplored by researchers so far. This paper addresses the gap by leveraging a unique dataset on foundational literacy and numeracy outcomes, representative of six public educational systems encompassing over 900,000 enrolled children in South Asia and West Africa. With this dataset, we examine the current extent of between-school heterogeneity in learning outcomes, the potential predictors of this heterogeneity, and explore its potential implications for setting national curricula for different grade levels and subjects. Our findings reveal that between-school heterogeneity can indeed present both a severe pedagogical hindrance and challenges for policymakers, particularly in contexts with relatively higher levels of performance and in the higher grades. In response to meaningful between-system heterogeneity, we also demonstrate through simulation that a more nuanced, data-driven targeting of curricular expectations for different schools within a system could empower policymakers to effectively reach a broader spectrum of students through classroom instruction.

More →