Methodology, measurement and data

Todd Pugatch, Elizabeth Schroeder, Nicholas Wilson.

We design a commitment contract for college students, "Study More Tomorrow," and conduct a randomized control trial testing a model of its demand. The contract commits students to attend peer tutoring if their midterm grade falls below a pre-specified threshold. The contract carries a financial penalty for noncompliance, in contrast to other commitment devices for studying tested in the literature. We find demand for the contract, with take-up of 10% among students randomly assigned a contract offer. Contract demand is not higher among students randomly assigned to a lower contract price, plausibly because a lower contract price also means a lower commitment benefit of the contract. Students with the highest perceived utility for peer tutoring have greater demand for commitment, consistent with our model. Contrary to the model's predictions, we fail to find evidence of increased demand among present-biased students or among those with higher self-reported tendency to procrastinate. Our results show that college students are willing to pay for study commitment devices. The sources of this demand do not align fully with behavioral theories, however.

Daniel Rodriguez-Segura, Beth E. Schueler.

A significant share of education and development research uses data collected by workers called “enumerators.” It is well-documented that “enumerator effects”—or inconsistent practices between the individual people who administer measurement tools— can be a key source of error in survey data collection. However, it is less understood whether this is a problem for academic assessments or performance tasks. We leverage a remote phone-based mathematics assessment of primary school students and survey of their parents in Kenya. Enumerators were randomized to students to study the presence of enumerator effects. We find that both the academic assessment and survey was prone to enumerator effects and use simulation to show that these effects were large enough to lead to spurious results at a troubling rate in the context of impact evaluation. We therefore recommend assessment administrators randomize enumerators at the student level and focus on training enumerators to minimize bias.

Edward J. Kim.

This study introduces the signal weighted teacher value-added model (SW VAM), a value-added model that weights student-level observations based on each student’s capacity to signal their assigned teacher’s quality. Specifically, the model leverages the repeated appearance of a given student to estimate student reliability and sensitivity parameters, whereas traditional VAMs represent a special case where all students exhibit identical parameters. Simulation study results indicate that SW VAMs outperform traditional VAMs at recovering true teacher quality when the assumption of student parameter invariance is met but have mixed performance under alternative assumptions of the true data generating process depending on data availability and the choice of priors. Evidence using an empirical data set suggests that SW VAM and traditional VAM results may disagree meaningfully in practice. These findings suggest that SW VAMs have promising potential to recover true teacher value-added in practical applications and, as a version of value-added models that attends to student differences, can be used to test the validity of traditional VAM assumptions in empirical contexts.

Daniela Alvarez-Vargas, Sirui Wan, Lynn S. Fuchs, Alice Klein, Drew H. Bailey.

Despite policy relevance, longer-term evaluations of educational interventions are relatively rare. A common approach to this problem has been to rely on longitudinal research to determine targets for intervention by looking at the correlation between children’s early skills (e.g., preschool numeracy) and medium-term outcomes (e.g., first-grade math achievement). However, this approach has sometimes over—or under—predicted the long-term effects (e.g., 5th-grade math achievement) of successfully improving early math skills. Using a within-study comparison design, we assess various approaches to forecasting medium-term impacts of early math skill-building interventions. The most accurate forecasts were obtained when including comprehensive baseline controls and using a combination of conceptually proximal and distal short-term outcomes (in the nonexperimental longitudinal data). Researchers can use our approach to establish a set of designs and analyses to predict the impacts of their interventions up to two years post-treatment. The approach can also be applied to power analyses, model checking, and theory revisions to understand mechanisms contributing to medium-term outcomes.

Johnathan G. Conzelmann, Steven W. Hemelt, Brad J. Hershbein, Shawn Martin, Andrew Simon, Kevin Stange.

This paper introduces a new measure of the labor markets served by colleges and universities across the United States. About 50 percent of recent college graduates are living and working in the metro area nearest the institution they attended, with this figure climbing to 67 percent in-state. The geographic dispersion of alumni is more than twice as great for highly selective 4-year institutions as for 2-year institutions. However, more than one-quarter of 2-year institutions disperse alumni more diversely than the average public 4-year institution. In one application of these data, we find that the average strength of the labor market to which a college sends its graduates predicts college-specific intergenerational economic mobility. In a second application, we quantify the extent of “brain drain” across areas and illustrate the importance of considering migration patterns of college graduates when estimating the social return on public investment in higher education.

Brian Heseung Kim, Katharine Meyer, Alice Choe.

Interactive, text message-based advising programs have become an increasingly common strategy to support college access and success for underrepresented student populations. Despite the proliferation of these programs, we know relatively little about how students engage in these text-based advising opportunities and whether that relates to stronger student outcomes – factors that could help explain why we’ve seen relatively mixed evidence about their efficacy to date. In this paper, we use data from a large-scale, two-way text advising experiment focused on improving college completion to explore variation in student engagement using nuanced interaction metrics and automated text analysis techniques (i.e., natural language processing). We then explore whether student engagement patterns are associated with key outcomes including persistence, GPA, credit accumulation, and degree completion. Our results reveal substantial variation in engagement measures across students, indicating the importance of analyzing engagement as a multi-dimensional construct. We moreover find that many of these nuanced engagement measures have strong correlations with student outcomes, even after controlling for student baseline characteristics and academic performance. Especially as virtual advising interventions proliferate across higher education institutions, we show the value of applying a more codified, comprehensive lens for examining student engagement in these programs and chart a path to potentially improving the efficacy of these programs in the future.

Open source code on GitHub.

Christine Mulhern, Isaac M. Opper.

There is an emerging consensus that teachers impact multiple student outcomes, but it remains unclear how to measure and summarize the multiple dimensions of teacher effectiveness into simple metrics for research or personnel decisions. We present a multidimensional empirical Bayes framework and illustrate how to use noisy estimates of teacher effectiveness to assess the dimensionality and predictive power of teachers' true effects. We find that it is possible to efficiently summarize many dimensions of effectiveness and most summary measures lead to similar teacher rankings; however, focusing on any one specific measure alone misses important dimensions of teacher quality.

Michael F. Lovenheim, Jonathan Smith.

Early research on the returns to higher education treated the postsecondary system as a monolith.  In reality, postsecondary education in the United States and around the world is highly differentiated, with a variety of options that differ by credential (associates degree, bachelor’s degree, diploma, certificate, graduate degree), the control of the institution (public, private not-for-profit, private for-profit), the quality/resources of the institution, field of study, and exposure to remedial education. In this Chapter, we review the literature on the returns to these different types of higher education investments, which has received increasing attention in recent decades. We first provide an overview of the structure of higher education in the U.S. and around the world, followed by a model that helps clarify and articulate the assumptions employed by different estimators used in the literature. We then discuss the research on the return to institution type, focusing on the return to two-year, four-year, and for-profit institutions as well as the return to college quality within and across these institution types. We also present the research on the return to different educational programs, including vocational credentials, remedial education, field of study, and graduate school. The wide variation in the returns to different postsecondary investments that we document leads to the question of how students from different backgrounds sort into these different institutions and programs. We discuss the emerging research showing that lower-SES students, especially in the U.S., are more likely to sort into colleges and programs with lower returns as well as results from recent U.S.-based interventions and policies designed to support success among students from disadvantaged backgrounds. The Chapter concludes with some broad directions for future research.

Brendan Bartanen, Aliza N. Husain.

A growing literature uses value-added (VA) models to quantify principals' contributions to improving student outcomes. Principal VA is typically estimated using a connected networks model that includes both principal and school fixed effects (FE) to isolate principal effectiveness from fixed school factors that principals cannot control. While conceptually appealing, high-dimensional FE regression models require sufficient variation to produce accurate VA estimates. Using simulation methods applied to administrative data from Tennessee and New York City, we show that limited mobility of principals among schools yields connected networks that are extremely sparse, where VA estimates are either highly localized or statistically unreliable. Employing a random effects shrinkage estimator, however, can alleviate estimation error to increase the reliability of principal VA.

Wendy Castillo, David Gillborn.

‘QuantCrit’ (Quantitative Critical Race Theory) is a rapidly developing approach that seeks to challenge and improve the use of statistical data in social research by applying the insights of Critical Race Theory. As originally formulated, QuantCrit rests on five principles; 1) the centrality of racism; 2) numbers are not neutral; 3) categories are not natural; 4) voice and insight (data cannot ‘speak for itself); and 5) a social justice/equity orientation (Gillborn et al, 2018). The approach has quickly developed an international and interdisciplinary character, including applications in medicine (Gerido, 2020) and literature (Hammond, 2019). Simultaneously, there has been ferocious criticism from detractors outraged by the suggestion that numbers are anything other than objective and scientific (Airaksinen, 2018). In this context it is vital that the approach establishes some common understandings about good practice; in order to sustain rigor, make QuantCrit accessible to academics, practioners, and policymakers alike, and resist widespread attempts to over-simplify and pillory. This paper is intended to advance an iterative process of expanding and clarifying how to ‘QuantCrit’.

