Search EdWorkingPapers

Search for EdWorkingPapers here by author, title, or keywords.

Program and policy effects

Katharine Meyer, Kelli A. Bird, Benjamin L. Castleman.
With rapid technological transformations to the labor market, many working adults return to college after graduation to obtain additional training or credentials. Using a comparative individual fixed effects strategy and an administrative panel dataset of enrollment and employment in Virginia, we provide the first causal estimates of credential “stacking” – earning two or more community college certificates or degrees – among working adults. We find stacking increases employment by four percentage points and quarterly wages by $375 (four percent). Returns are larger for individuals studying in Health and who return to college after first completing a short-term certificate.

More →


Jing Liu, Michael S. Hayes, Seth Gershenson.

We use novel data on disciplinary referrals, including those that do not lead to suspensions, to better understand the origins of racial disparities in exclusionary discipline. We find significant differences between Black and white students in both referral rates and the rate at which referrals convert to suspensions. An infraction fixed-effects research design that compares the disciplinary outcomes of white and non-white students who were involved in the same multi-student incident identifies systematic racial biases in sentencing decisions. On both the intensive and extensive margins, Black and Hispanic students receive harsher sentences than their white co-conspirators. This result is driven by high school infractions and mainly applies to “more severe” infractions that involve fights or drugs. Reducing racial disparities in exclusionary discipline will require addressing underlying gaps in disciplinary referrals and the systematic biases that appear in the adjudication process.

More →


Emily Morton.

Four-day school weeks have proliferated across the United States in recent years, reaching over 650 public school districts in 24 states as of 2019, but little is known about their implementation and there is no consensus on their effects on students. This study uses district level panel data from Oklahoma and a difference-in-differences research design to provide estimates of the causal effect of the four-day school week on high school students’ ACT scores, attendance, and disciplinary incidents during school. Results indicate that four-day school weeks decrease per-pupil bullying incidents by approximately 39% and per-pupil fighting incidents by approximately 31%, but have no detectable effect on other incident types, ACT scores, or attendance.

More →


Benjamin W. Arold, Ludger Woessmann, Larissa Zierow.

We study whether compulsory religious education in schools affects students' religiosity as adults. We exploit the staggered termination of compulsory religious education across German states in models with state and cohort fixed effects. Using three different datasets, we find that abolishing compulsory religious education significantly reduced religiosity of affected students in adulthood. It also reduced the religious actions of personal prayer, church-going, and church membership. Beyond religious attitudes, the reform led to more equalized gender roles, fewer marriages and children, and higher labor-market participation and earnings. The reform did not affect ethical and political values or non-religious school outcomes.

More →


Robert P. Strauss.

This paper compares and contrasts two required building level school violence measures under NCLB, arrests and incidents of well-defined school misconduct acts, across 20 years of Pennsylvania’s approximately 3,000 public school buildings. Generally, both arrests for school violence and incidents of school violence are rare events. Over 20 years, the third quartile arrest rate was zero and, the third quartile incident rate was 3.3%. Relatively few, 4.1% overall, of Pennsylvania’s school buildings were persistently dangerous as defined and reported pursuant to Pennsylvania’s state plan to the US Department of Education; however, these buildings represented about 7.8% of the student population statewide. When we measure whether or not a school building is dangerous based on reported school violence incidents, that is without an arrest requirement, fully 36.9% of Pennsylvania’school buildings were dangerous, and they represented 46.7% of the students statewide. Both Philadelphia and Pittsburgh public school buildings were disproportionately unsafe and among the top 20 districts in the state which were unsafe over the 20 year study period.

Exploratory regression analysis of mean building scale scores for math and language arts explained about 58% of the variation in such learning outcome measures. As expected, household poverty, holding all else constant, has very strong, negative effects on learning outcomes. A school building composed entirely of low income students will score about 240 scale points lower, about 1.24 standard deviations lower, than a school building without any low income students. A school building at the 90th percentile in terms of student misconduct and poverty rates, would have lower student test scores by about 1 to 1.28 standard deviations. Were a school administrator to reduce student misconduct rates from the 90th percentile to the 50th percentile, our regression coefficients predict learning gains on the order of (100-43) = 2/3 of a standard deviation in mean scale scores.

More →


Sarah A. Cordes, Christopher Rick, Amy Ellen Schwartz.

School buses may be a critical education policy lever, breaking the link between schools and neighborhoods and facilitating access to school choice. Yet little is known about the commute for bus riders, including the average length of the bus ride or whether long commutes harm academic outcomes. We begin to fill this gap using data from New York City to explore the morning commutes of over 120,000 bus riders. We find that long bus rides are uncommon and that those with long bus rides are disproportionately Black and more likely to attend charter or district-choice schools. We find deleterious effects of long bus rides on attendance and chronic absenteeism of district-choice students.

More →


Zachary Bleemer, Aashish Mehta.

Underrepresented minority (URM) college students have been steadily earning degrees in relatively less-lucrative fields of study since the mid-1990s. A decomposition reveals that this widening gap is principally explained by rising stratification at public research universities, many of which increasingly enforce GPA restriction policies that prohibit students with poor introductory grades from declaring popular majors. We investigate these GPA restrictions by constructing a novel 50-year dataset covering four public research universities' student transcripts and employing a staggered difference-in-difference design around the implementation of 29 restrictions. Restricted majors’ average URM enrollment share falls by 20 percent, which matches observational patterns and can be explained by URM students’ poorer average pre-college academic preparation. Using first-term course enrollments to identify students who intend to earn restricted majors, we find that major restrictions disproportionately lead URM students from their intended major toward less-lucrative fields, driving within-institution ethnic stratification and likely exacerbating labor market disparities.

More →


M. Danish Shakeel, Paul E. Peterson.

Principals (policymakers) disagree as to whether U. S. student performance has changed over the past half century. To inform conversations, agents administered seven million psychometrically linked tests in math (m) and reading (rd) in 160 survey waves to national probability samples of cohorts born between 1954 and 2007. Estimated change in standard deviations (sd) per decade varies by agent (m: -0.10sd to 0.27sd, rd: -0.02sd to 0.12sd). Consistent with Flynn effects, median trends show larger gains in m (0.19sd) than rd (0.04sd), though rates of progress for cohorts born since 1990 have increased in rd but slowed in m. Greater progress is shown by students tested at younger ages (m: 0.31sd, rd: 0.08sd) than when tested in middle years of schooling (m: 0.17sd, rd: 0.03sd) or toward end of schooling (m: 0.06sd, rd: 0.02sd). Young white students progress more slowly (m: 0.28sd, rd: 0.09sd) than Asian (m: 46sd, rd: 0.28sd), black (m: 0.36sd, rd: 0.19sd) and Hispanic (m: 0.29sd, rd: 0.13sd) students. These ethnic differences generally attenuate as students age. Young students in the bottom quartile of the SES distribution show greater progress than those in the top quartile (difference in m: 0.08sd, in rd: 0.15sd), but the reverse is true for older students. Moderators likely include not only changes in families and schools but also improvements in nutrition, health care, and protection from contagious diseases and environmental risks. International data suggest that subject and age differentials may be due to moderators more general than just the United States.

More →


Reagan Mozer, Luke W. Miratrix, Jackie Eunjung Relyea, James S. Kim.

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching ones understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. Machine-based text analytic and data mining tools offer one potential avenue to help facilitate research in this domain. For instance, we could augment a traditional impact analysis that examines a single human-coded outcome with a suite of automatically generated secondary outcomes. By analyzing impacts across a wide array of text-based features, we can then explore what an overall change signifies, in terms of how the text has evolved due to treatment. In this paper, we propose several different methods for supplementary analysis in this spirit. We then present a case study of using these methods to enrich an evaluation of a classroom intervention on young children’s writing. We argue that our rich array of findings move us from “it worked” to “it worked because” by revealing how observed improvements in writing were likely due, in part, to the students having learned to marshal evidence and speak with more authority. Relying exclusively on human scoring, by contrast, is a lost opportunity.

More →


Biraj Bisht, Zachary LeClair, Susanna Loeb, Min Sun.

Paraeducators perform multiple roles in U.S. classrooms, including among others preparing classroom activities, working with students individually and in small groups, supporting individualized programming for students with disabilities, managing classroom behavior, and engaging with parents and communities. Yet, little research provides insights into this key group of educators. This study combines an analysis of national administrative data to describe the paraeducator labor market with a systematic review of collective bargaining agreements and other job-defining documents in ten case-study districts. We find a large and expanding labor market of paraeducators, far more diverse along ethnic and racial lines than certified teachers but with far lower wages, fewer performance incentives, less professional development, and fewer opportunities for advancement within the profession.

More →