Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Luke W. Miratrix

Joshua B. Gilbert, James S. Kim, Luke W. Miratrix.

Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing Heterogeneous Treatment Effects (HTE) fail to address the HTE that may exist within outcome measures. In this study, we present a novel application of the Explanatory Item Response Model (EIRM) for assessing what we term “item-level” HTE (IL-HTE), in which a unique treatment effect is estimated for each item in an assessment. Results from data simulation reveal that when IL-HTE are present but ignored in the model, standard errors can be underestimated and false positive rates can increase. We then apply the EIRM to assess the impact of a literacy intervention focused on promoting transfer in reading comprehension on a digital formative assessment delivered online to approximately 8,000 third-grade students. We demonstrate that allowing for IL-HTE can reveal treatment effects at the item-level masked by a null average treatment effect, and the EIRM can thus provide fine-grained information for researchers and policymakers on the potentially heterogeneous causal effects of educational interventions.

More →

Reagan Mozer, Luke W. Miratrix, Jackie Eunjung Relyea, James S. Kim.

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching ones understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. Machine-based text analytic and data mining tools offer one potential avenue to help facilitate research in this domain. For instance, we could augment a traditional impact analysis that examines a single human-coded outcome with a suite of automatically generated secondary outcomes. By analyzing impacts across a wide array of text-based features, we can then explore what an overall change signifies, in terms of how the text has evolved due to treatment. In this paper, we propose several different methods for supplementary analysis in this spirit. We then present a case study of using these methods to enrich an evaluation of a classroom intervention on young children’s writing. We argue that our rich array of findings move us from “it worked” to “it worked because” by revealing how observed improvements in writing were likely due, in part, to the students having learned to marshal evidence and speak with more authority. Relying exclusively on human scoring, by contrast, is a lost opportunity.

More →

Sophie Litschwartz, Luke W. Miratrix.

In multisite experiments, we can quantify treatment effect variation with the cross-site treatment effect variance. However, there is no standard method for estimating cross-site treatment effect variance in multisite regression discontinuity designs (RDD). This research rectifies this gap in the literature by systematically exploring and evaluating methods for estimating the cross-site treatment effect variance in multisite RDDs. Specifically, we formalize a fixed intercepts/random coefficients (FIRC) RDD model and develop a random effects meta-analysis (Meta) RDD model for estimating cross-site treatment effect variance. We find that a restricted FIRC model works best when the running variables' relationship to the outcome is stable across sites but can be biased otherwise. In those instances, we recommend using either the unrestricted FIRC model or the meta-analysis model; with the unrestricted FIRC model generally performing better when the average number of in-bandwidth observations is less than 120 and the meta-analysis model performing better when the average number of in-bandwidth observations is above 120. We apply our models to a high school exit exam policy in Massachusetts that required students who passed the high school exit exam but were still determined to be nonproficient to complete an ``Education Proficiency Plan" (EPP). We find the EPP policy had a positive local average treatment effect on whether students completed a math course their senior year on average across sites, but that the impact varied enough such that a third of schools could have had a negative impact.

More →