Search EdWorkingPapers

Search EdWorkingPapers by author, title, or keywords.

Methodology, measurement and data

Andreas de Barros.

Explaining the productivity paradox—the phenomenon where an introduction of information and communication technology (ICT) does not lead to improvements in labor productivity—is difficult, as changes in technology often coincide with adjustments to working hours and substitution of labor. I conduct a cluster-randomized trial in India to investigate the effects of a program that provides teachers with continuous training and materials, encouraging them to blend their instruction with high-quality videos. Teaching hours, teacher-to-student assignments, and the curriculum are held constant. Eleven months after its launch, I document negative effects on student learning in grades 9 and 10 in mathematics, and no effects in science. I also find detrimental effects on instructional quality, instructional practices, and student perceptions and attitudes towards mathematics and science. These findings suggest adjustment costs can serve as one explanation for the paradox.

More →


Monnica Chan, Blake Heller.

Generally, need-based financial aid improves students’ academic outcomes. However, the largest source of need-based grant aid in the United States, the Federal Pell Grant Program (Pell), has a mixed evaluation record. We assess the minimum Pell Grant in a regression discontinuity framework, using Kentucky administrative data. We focus on whether and how year-to-year changes in aid eligibility and interactions with other sources of aid attenuate Pell’s estimated effects on post-secondary outcomes. This evaluation complements past work by assessing explanations for the null or muted impacts found in our analysis and other Pell evaluations. We also discuss the limitations of using regression discontinuity methods to evaluate Pell—or other interventions with dynamic eligibility criteria—with respect to generalizability and construct validity.

More →


Danielle Sanderson Edwards, Matthew A. Kraft, Alvin Christian, Christopher A. Candelaria.
We develop a unifying conceptual framework for understanding and predicting teacher shortages at the state, region, district, and school levels. We then generate and test hypotheses about geographic and subject variation in teacher shortages using data on unfilled teaching positions in Tennessee during the fall of 2019. We find that teacher staffing challenges are highly localized, causing shortages and surpluses to coexist. Aggregate descriptions of staffing challenges mask considerable variation between schools and subjects within districts. Schools with fewer local early-career teachers, smaller district salary increases, worse working conditions, and higher historical attrition rates have higher vacancy rates. Our findings illustrate why viewpoints about, and solutions to, shortages depend critically on whether one takes an aggregate or local perspective.

More →


Brendan Bartanen, Courtney Bell, Jessalynn James, Eric S. Taylor, James H. Wyckoff.

Novice teachers improve substantially in their first years on the job, but we know remarkably little about the nature of this skill development. Using data from Tennessee, we leverage a feature of the classroom observation protocol that asks school administrators to identify an item on which the teacher should focus their improvement efforts. This “area of refinement” overcomes a key measurement challenge endemic to inferring from classroom observation scores the development of specific teaching skills. We show that administrators disproportionately identify two teaching skills when observing novice teachers: classroom management and presenting content. Struggling with classroom management, in particular, is linked to high rates of novice teacher attrition. Among those who remain, we observe subsequent improvement in these skills.

More →


William Delgado.

Does student-teacher match quality exist? Prior work has documented large disparities in teachers' impacts across student types but has not distinguished between sorting and causal effects as the drivers of these disparities. I propose a disparate value-added model and derive a novel measure of teacher quality---revealed comparative advantage---that captures the degree to which teachers affect student outcome gaps. Quasi-experimental changes in teaching staff show that the comparative advantage measure accurately predicts teachers’ disparate impacts: a teacher with a 1 standard deviation in revealed comparative advantage for black students increases black students' test scores by 1 standard deviation and has no effect on non-black students' test scores. Teacher removal and teacher-to-classroom re-allocation simulations show substantial efficiency and equity gains of considering teachers’ comparative advantage.

More →


Noam Angrist, Rachael Meager.

Targeted instruction is one of the most effective educational interventions in low- and middle-income countries, yet reported impacts vary by an order of magnitude. We study this variation by aggregating evidence from prior randomized trials across five contexts, and use the results to inform a new randomized trial. We find two factors explain most of the heterogeneity in effects across contexts: the degree of implementation (intention-to-treat or treatment-on-the-treated) and program delivery model (teachers or volunteers). Accounting for these implementation factors yields high generalizability, with similar effect sizes across studies. Thus, reporting treatment-on-the-treated effects, a practice which remains limited, can enhance external validity. We also introduce a new Bayesian framework to formally incorporate implementation metrics into evidence aggregation. Results show targeted instruction delivers average learning gains of 0.42 SD when taken up and 0.85 SD when implemented with high fidelity. To investigate how implementation can be improved in future settings, we run a new randomized trial of a targeted instruction program in Botswana. Results demonstrate that implementation can be improved in the context of a scaling program with large causal effects on learning. While research on implementation has been limited to date, our findings and framework reveal its importance for impact evaluation and generalizability.

More →


Brian McManus, Jessica Howell, Michael Hurwitz.

The impact of test-optional college admissions policies depends on whether applicants act strategically in disclosing test scores. We analyze individual applicants’ standardized test scores and disclosure behavior to 50 major US colleges for entry in fall 2021, when Covid-19 prompted widespread adoption of test-optional policies. Applicants withheld low scores and disclosed high scores, including seeking admissions advantages by conditioning their disclosure choices on their other academic characteristics, colleges’ selectivity and testing policy statements, and the Covid-related test access challenges of the applicants’ local peers. We find only modest differences in test disclosure strategies by applicants’ race and socioeconomic characteristics.

More →


Mei Tan, Dorottya Demszky.

Teachers’ attitudes and classroom management practices critically affect students’ academic and behavioral outcomes, contributing to the persistent issue of racial disparities in school discipline. Yet, identifying and improving classroom management at scale is challenging, as existing methods require expensive classroom observations by experts. We apply natural language processing methods to elementary math classroom transcripts to computationally measure the frequency of teachers’ classroom management language in instructional dialogue and the degree to which such language is reflective of punitive attitudes. We find that the frequency and punitiveness of classroom management language show strong and systematic correlations with human-rated observational measures of instructional quality, student and teacher perceptions of classroom climate, and student academic outcomes. Our analyses reveal racial disparities and patterns of escalation in classroom management language. We find that classrooms with higher proportions of Black students experience more frequent and more punitive classroom management. The frequency and punitiveness of classroom management language escalate over time during observations, and these escalations occur more severely for classrooms with higher proportions of Black students. Our results demonstrate the potential of automated measures and position everyday classroom management interactions as a critical site of intervention for addressing racial disparities, preventing escalation, and reducing punitive attitudes.

More →


Wendy Castillo, David Gillborn.

‘QuantCrit’ (Quantitative Critical Race Theory) is a rapidly developing approach that seeks to challenge and improve the use of statistical data in social research by applying the insights of Critical Race Theory. As originally formulated, QuantCrit rests on five principles; 1) the centrality of racism; 2) numbers are not neutral; 3) categories are not natural; 4) voice and insight (data cannot ‘speak for itself); and 5) a social justice/equity orientation (Gillborn et al, 2018). The approach has quickly developed an international and interdisciplinary character, including applications in medicine (Gerido, 2020) and literature (Hammond, 2019). Simultaneously, there has been ferocious criticism from detractors outraged by the suggestion that numbers are anything other than objective and scientific (Airaksinen, 2018). In this context it is vital that the approach establishes some common understandings about good practice; in order to sustain rigor, make QuantCrit accessible to academics, practioners, and policymakers alike, and resist widespread attempts to over-simplify and pillory. This paper is intended to advance an iterative process of expanding and clarifying how to ‘QuantCrit’.

More →


Paul T. von Hippel.

Educational researchers often report effect sizes in standard deviation units (SD), but SD effects are hard to interpret. Effects are easier to interpret in percentile points, but conversion from SDs to percentile points involves a calculation that is not intuitive to educational stakeholders. We point out that, if the outcome variable is normally distributed, simply multiplying the SD effect by 37 usually gives an excellent approximation to the percentile-point effect. For students in the middle half of the distribution, the approximation is accurate to within 1 percentile point for effect sizes of up to 0.8 SD (or 29 to 30 percentile points).

More →