Search EdWorkingPapers by author, title, or keywords.
Methodology, measurement and data
Low-socioeconomic status (SES), minority, and male students perform worse than their high-SES, non-minority, and female peers on standardized tests. This paper investigates how within-school differences in school quality contribute to these educational achievement gaps. Using individual-level data on the universe of public-school students in California, I estimate school quality using a value added methodology that accounts for the fact that students sort to schools on observable characteristics. I allow for within-school heterogeneity by estimating a distinct value added for each school's low-/high-SES, minority/non-minority, and male/female students. Standard value added models suggest that on average schools provide less value added to their low-SES, minority, and male students, particularly on postsecondary enrollment. However, value added models that control for neighborhood, older-sibling, and peer characteristics suggest that schools provide similar value added to low-/high-SES students and minority/non-minority students but more value added to female students. Within-school heterogeneity accounts for 6% of the test-score achievement gap and 22% of the difference in postsecondary enrollment between men and women.
This study uses implementation fidelity data from PreK to 1st grade in the Boston Public Schools (BPS) to measure instructional alignment and examine whether stronger alignment is associated with sustained benefits of BPS PreK on children’s language, literacy, and math skills through first grade. The study includes N = 498 students (mean age = 5.47, SD = 0.30 in K fall). Children who experienced strong instructional alignment across grades had faster gains in literacy (SD = .47) and math (SD = .28) skills through the spring of first grade compared with non-BPS PreK attenders. Mis-alignment predicted faster convergence in literacy skills. Results highlight that instructional alignment may help to sustain the initial benefits of PreK programs through first grade in a subset of outcome domains. Implications for further research measuring alignment in a broader range of settings and implications for practice are discussed.
When analyzing treatment effects on test score data, education researchers face many choices for scoring tests and modeling results. This study examines the impact of those choices through Monte Carlo simulation and an empirical application. Results show that estimates from multiple analytic methods applied to the same data will vary because, as predicted by Classical Test Theory, two-step models using sum or IRT-based scores provide downwardly biased standardized treatment effect coefficients compared to latent variable models. This bias dominates any other differences between models or features of the data generating process, such as the variability of item discrimination parameters. An errors-in-variables (EIV) correction successfully removes the bias from two-step models. Model performance is not substantially different in terms of precision, standard error calibration, false positive rates, or statistical power. An empirical application to data from a randomized controlled trial of a second-grade literacy intervention demonstrates the sensitivity of the results to model selection and tradeoffs between model selection and interpretation. This study shows that the psychometric principles most consequential in causal inference are related to attenuation bias rather than optimal scoring weights.
Teachers are the most important school-specific factor in student learning. Yet, little evidence exists linking teacher professional development programs and the strategies or activities that comprise them to student achievement. In this paper, we examine a fellowship model for professional development designed and implemented by Leading Educators, a national nonprofit organization that aims to bridge research and practice to improve instructional quality and accelerate learning across school systems. During the 2015-16 and 2016-17 school years, Leading Educators conducted its fellowship program for two cohorts of teachers and school leaders to provide these educators ongoing, collaborative, job-embedded professional development and to improve student achievement. Relying on quasi-experimental methods, we find that a school’s participation in the fellowship program significantly increased student proficiency rates in English language arts and math on state achievement exams. Student achievement benefitted from a more sustained duration of participation in the fellowship program, varied depending on the share of a school’s educators who participated in the fellowship, and differed based on whether fellows independently selected into the program or were appointed to participate by their school leaders. Taken together, findings from this paper should inform professional learning organizations, schools, and policymakers on the design, implementation, and impact of educator professional development.
Anecdotal evidence points to the importance of school principals, but the limited existing research has neither provided consistent results nor indicated any set of essential characteristics of effective principals. This paper exploits extensive student-level panel data across six states to investigate both variations in principal performance and the relationship between effectiveness and key certification factors. While principal effectiveness varies widely across states, there is little indication that regulation of the background and training of principals yields consistently effective performance. Having prior teaching or management experience is not related to our estimates of principal value-added.
Greater school choice leads to lower demand for private tutoring according to various international studies, but this has not been explicitly tested for the U.S. context. To estimate the causal effect of charter school appearances on neighboring private tutoring prevalence, we employ a comparative event study model combined with a longitudinal matching strategy to accommodate differing treatment years. In contrast to findings from other countries, we estimate that charter schools increase, rather than decrease, tutoring prevalence in the United States. We further find that the effect varies considerably based on the characteristics of the treated neighborhood: areas with the highest income, educational attainment, and proportion Asian show the greatest treatment impacts, while the areas with the least show null effects. Moreover, methodologically this investigation offers a pipeline for flexibly estimating causal effects with observational, longitudinal, geographically located data.
Researchers use test outcomes to evaluate the effectiveness of education interventions across numerous randomized controlled trials (RCTs). Aggregate test data—for example, simple measures like the sum of correct responses—are compared across treatment and control groups to determine whether an intervention has had a positive impact on student achievement. We show that item-level data and psychometric analyses can provide information about treatment heterogeneity and improve design of future experiments. We apply techniques typically used in the study of Differential Item Functioning (DIF) to examine variation in the degree to which items show treatment effects. That is, are observed treatment effects due to generalized gains on the aggregate achievement measures or are they due to targeted gains on specific items? Based on our analysis of 7,244,566 item responses (265,732 students responding to 2,119 items) taken from 15 RCTs in low-and-middle-income countries, we find clear evidence for variation in gains across items. DIF analyses identify items that are highly sensitive to the interventions—in one extreme case, a single item drives nearly 40% of the observed treatment effect—as well as items that are insensitive. We also show that the variation of item-level sensitivity can have implications for the precision of effect estimates. Of the RCTs that have significant effect estimates, 41% have patterns of item-level sensitivity to treatment that allow for the possibility of a null effect when this source of uncertainty is considered. Our findings demonstrate how researchers can gain more insight regarding the effects of interventions via additional analysis of item-level test data.
Books shape how children learn about society and norms, in part through representation of different characters. We introduce new artificial intelligence methods for systematically converting images into data and apply them, along with text analysis methods, to measure the representation of skin color, race, gender, and age in award-winning children’s books widely read in homes, classrooms, and libraries over the last century. We find that more characters with darker skin color appear over time, but the most influential books persistently depict characters with lighter skin color, on average, than other books, even after conditioning on race; we also find that children are depicted with lighter skin than adults on average. Relative to their growing share of the U.S. population, Black and Latinx people are underrepresented in these same books, while White males are overrepresented. Over time, females are increasingly present but appear less often in text than in images, suggesting greater symbolic inclusion in pictures than substantive inclusion in stories. We then present analysis of the supply of, and demand for, books with different levels of representation to better understand the economic behavior that may contribute to these patterns. On the demand side, we show that people consume books that center their own identities. On the supply side, we document higher prices for books that center non-dominant social identities and fewer copies of these books in libraries that serve predominantly White communities. Lastly, we show that the types of children's books purchased in a neighborhood are related to local political beliefs.
How progressive is school spending when spending is measured at the school-level, instead of the district-level? We use the first dataset on school-level spending across schools throughout the United States to ask to what extent progressivity patterns previously examined across districts are amplified, nullified, or reversed, upon disaggregation to schools. We find that progressivity is systematically greater when we conduct a school-level analysis, rather than district-level analysis. This may be surprising, given the traditional view in public economics that local governments cannot effectively redistribute. We thus probe the data for explanations for this pattern, uncovering evidence that federal policies play an important role in driving within-district progressive allocations. In particular, we can explain about 83% of the within-district contribution to progressivity by the federal component of spending plus allocations that are empirically attributable to special education and English language learning programs. Our findings are thus consistent with the traditional view of redistribution being primarily the purview of central governments, operationalized in this context through mandates.
Challenging the conventional wisdom that the spread of democracy was a leading driver of the expansion of primary schooling, recent studies show that democratization in fact did not lead to an average increase in primary school enrollment rates. One reason for this null effect is that there was already considerable provision of primary education before democratization. Still, it is possible that the spread of democracy did impact other aspects of education systems, such as the content of education and the extent to which teaching jobs are politicized. Studying this possibility cross-nationally has been infeasible due to data limitations. To address this gap, we take advantage of an original dataset covering 160 countries from 1945 to 2021 that contains information about these aspects of education. We document that transitions to democracy tend to be preceded by a decline in the politicization of both education content and teaching jobs. However, soon after democratization occurs, this decline usually halts. Counterfactual estimates suggest that democratization roughly halves the degree to which teacher hiring and firing decisions are politicized, but has a smaller impact on the content of education. The empirical patterns that we uncover have important implications for future research.