Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Methodology, measurement and data

Vivian C. Wong, Kylie L. Anglin, Peter M. Steiner.

Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.    

More →

10/2020476.4 KB
01/2021560.99 KB

Jing Liu, Julie Cohen.

Valid and reliable measurements of teaching quality facilitate school-level decision-making and policies pertaining to teachers. Using nearly 1,000 word-to-word transcriptions of 4th- and 5th-grade English language arts classes, we apply novel text-as-data methods to develop automated measures of teaching to complement classroom observations traditionally done by human raters. This approach is free of rater bias and enables the detection of three instructional factors that are well aligned with commonly used observation protocols: classroom management, interactive instruction, and teacher-centered instruction. The teacher-centered instruction factor is a consistent negative predictor of value-added scores, even after controlling for teachers’ average classroom observation scores. The interactive instruction factor predicts positive value-added scores. Our results suggest that the text-as-data approach has the potential to enhance existing classroom observation systems through collecting far more data on teaching with a lower cost, higher speed, and the detection of multifaceted classroom practices.

More →

Rajeev Darolia, Andrew Sullivan.

There is no national consensus on how school districts calculate high school achievement disparities between students who experience homelessness and those who do not. Using administrative student-level data from a mid-sized public school district in the Southern United States, we show that commonly used ways of defining which students are considered homeless can yield markedly different estimates of the homelessness-housed student high school graduation gap. The key distinctions among homelessness definitions relate to how to classify homeless students who become housed and how to consider students who transfer out of the district or drop out of school. Eliminating housing insecurity-related achievement disparities necessitates understanding the link between homelessness and educational achievement; how districts quantify homelessness affects measured gaps.

More →

Michael Gilraine, Odhrain McCarthy.

We show that fade out biases value-added estimates at the teacher-level. To do so, we use administrative data from North Carolina and show that teachers' value-added depend on the quality of the teacher that preceded them. Value-added estimators that control for fade out feature no such teacher-level bias. Under a benchmark policy that releases teachers in the bottom five percent of the value-added distribution, fifteen percent of teachers released using traditional techniques are not released once fade out is accounted for. Our results highlight the importance of incorporating dynamic features of education production into the estimation of teacher quality.  

More →

Lina M. Anaya, Gema Zamarro.

International assessments are important to benchmark the quality of education across countries. However, on low-stakes tests, students’ incentives to invest their maximum effort may be minimal. Research stresses that ignoring students’ effort when interpreting results from low-stakes assessments can lead to biased interpretations of test performance across groups of examinees. We use data from the Programme for International Student Assessment (PISA), a low-stakes test, to analyze the extent to which student effort helps to explain test scores heterogeneity across countries and by gender groups. Our results highlight the importance of accounting for differences in student effort to understand cross-country heterogeneity in performance and variations in gender achievement gaps across nations. We find that, once we account for differential student effort across gender groups, the estimated gender achievement gap in math and science could be up to 12 and 6 times wider, respectively, and up to 49 percent narrower in reading, in favor of boys. In math and science, the gap widens in most countries, even among some of the top 20 most gender-equal countries. Altogether, our effort measures on average explain between 36 and 40 percent of the cross-country variation in test scores.

More →

Josh B. McGee, Jonathan Mills, Jessica Goldstein.

School district consolidation is one of the most widespread education reforms of the last century, but surprisingly little research has directly investigated its effectiveness. To examine the impact of consolidation on student achievement, this study takes advantage of a policy that requires the consolidation of all Arkansas school districts with enrollment of fewer than 350 students for two consecutive school years. Using a regression discontinuity model, we find that consolidation has either null or small positive impacts on student achievement in math and English Language Arts (ELA). We do not find evidence that consolidation in Arkansas results in positive economies of scale, either by reducing overall cost or allowing for a greater share of resources to be spent in the classroom.

More →

Isaac M. Opper.

Researchers often include covariates when they analyze the results of randomized controlled trials (RCTs), valuing the increased precision of the estimates over the potential of inducing small-sample bias when doing so. In this paper, we develop a sufficient condition which ensures that the inclusion of covariates does not cause small-sample bias in the effect estimates. Using this result as a building block, we develop a novel approach that uses machine learning techniques to reduce the variance of the average treatment effect estimates while guaranteeing that the effect estimates remain unbiased. The framework also highlights how researchers can use data from outside the study sample to improve the precision of the treatment effect estimate by using the auxiliary data to better model the relationship between the covariates and the outcomes. We conclude with a simulation, which highlights the value of using the proposed approach.

More →

James D. Paul, Patrick J. Wolf.

Virtual charter schools provide full-time, tuition-free K-12 education through internet-based instruction. Although virtual schools offer a personalized learning experience, most research suggests these schools are negatively associated with achievement. Few studies account for differential rates of student mobility, which may produce biased estimates if mobility is jointly associated with virtual school enrollment and subsequent test scores. We evaluate the effects of a single, large, anonymous virtual charter school on student achievement using a hybrid of exact and nearest-neighbor propensity score matching. Relative to their matched peers, we estimate that virtual students produce marginally worse ELA scores and significantly worse math scores after one year. When controlling for student mobility during the outcome year, estimates of virtual schooling are slightly less negative. These findings may be more reliable indicators of the independent effect of virtual schooling if matching on mobility proxies for otherwise unobservable negative selection factors.

More →

C. Kirabo Jackson, Shanette C. Porter, John Q. Easton, Sebastian Kiguel.

We estimate the longer-run effects of attending an effective high school (one that improves a combination of test scores, survey measures of socio-emotional development, and behaviours in 9th grade) for students who are more versus less educationally advantaged (i.e., likely to attain more years of education based on 8th-grade characteristics). All students benefit from attending effective schools. However, the least advantaged students experience the largest improvements in high-school graduation, college-going, and school-based arrests. These patterns are driven by the least advantaged students benefiting the most from school impacts on the non-test-score dimensions of school quality. However, while there is considerable overlap in the effectiveness of schools attended by more and less advantaged students, it is the most advantaged students that are most likely to attend highly effective schools. These patterns underscore the importance of quality schools, and the non-test score components of quality schools, for improving the longer-run outcomes for less advantaged students.

More →

Paul T. von Hippel, Laura Bellows.

At least sixteen US states have taken steps toward holding teacher preparation programs (TPPs) accountable for teacher value-added to student test scores. Yet it is unclear whether teacher quality differences between TPPs are large enough to make an accountability system worthwhile. Several statistical practices can make differences between TPPs appear larger and more significant than they are. We reanalyze TPP evaluations from 6 states—New York, Louisiana, Missouri, Washington, Texas, and Florida—using appropriate methods implemented by our new caterpillar command for Stata. Our results show that teacher quality differences between most TPPs are negligible—.01-.03 standard deviations in student test scores—even in states where larger differences were reported previously. While ranking all a state’s TPPs may not be possible or desirable, in some states and subjects we can find a single TPP whose teachers stand out as significantly above or below average. Such exceptional TPPs may reward further study.

More →