Standards, accountability, assessment, and curriculum

Ishtiaque Fazlul, Cory Koedel, Eric Parsons.

Measures of student disadvantage—or risk—are critical components of equity-focused education policies. However, the risk measures used in contemporary policies have significant limitations, and despite continued advances in data infrastructure and analytic capacity, there has been little innovation in these measures for decades. We develop a new measure of student risk for use in education policies, which we call Predicted Academic Performance (PAP). PAP is a flexible, data-rich indicator that identifies students at risk of poor academic outcomes. It blends concepts from emerging early warning systems with principles of incentive design to balance the competing priorities of accurate risk measurement and suitability for policy use. In proof-of-concept policy simulations using data from Missouri, we show PAP is more effective than common alternatives at identifying students who are at risk of poor academic outcomes and can be used to target resources toward these students—and students who belong to several other associated risk categories—more efficiently.

Kate Antonovics, Sandra E. Black, Julie Berry Cullen, Akiva Yonah Meiselman.

Schools often track students to classes based on ability. Proponents of tracking argue it is a low-cost tool to improve learning since instruction is more effective when students are more homogeneous, while opponents argue it exacerbates initial differences in opportunities without strong evidence of efficacy. In fact, little is known about the pervasiveness or determinants of ability tracking in the US. To fill this gap, we use detailed administrative data from Texas to estimate the extent of tracking within schools for grades 4 through 8 over the years 2011-2019. We find substantial tracking; tracking within schools overwhelms any sorting by ability that takes place across schools. The most important determinant of tracking is heterogeneity in student ability, and schools operationalize tracking through the classification of students into categories such as gifted and disabled and curricular differentiation. When we examine how tracking changes in response to educational policies, we see that schools decrease tracking in response to accountability pressures. Finally, when we explore how exposure to tracking correlates with student mobility in the achievement distribution, we find positive effects on high-achieving students with no negative effects on low-achieving students, suggesting that tracking may increase inequality by raising the ceiling.

John Papay, Ann Mantil, Richard J. Murnane.

Many states use high-school exit examinations to assess students’ career and college readiness in core subjects. We find meaningful consequences of barely passing the mathematics examination in Massachusetts, as opposed to just failing it. However, these impacts operate at different educational attainment margins for low-income and higher-income students. As in previous work, we find that barely passing increases the probability of graduating from high school for low-income (particularly urban low-income) students, but not for higher-income students. However, this pattern is reversed for 4-year college graduation. For higher-income students only, just passing the examination increases the probability of completing a 4-year college degree by 2.1 percentage points, a sizable effect given that only 13% of these students near the cutoff graduate.

Walter Herring.

Because high-stakes testing for school accountability does not begin until third grade, accountability ratings for elementary schools do not directly measure students’ academic progress in grades K through 2. While it is possible that children’s test scores in grades 3 and above are highly correlated with children’s outcomes in the untested grades, research provides reasons to believe that this might not be the case in all schools. This study explores whether measures of school quality based on test scores in grades 3 through 5 serve as a strong proxy for children’s academic outcomes in grades K through 2. The results show that directly accounting for children’s test scores in the early grades could lead to meaningful changes in schools’ test-based performance ratings. The findings have important implications for accountability policy.

Morgan S. Polikoff, Laura M. Desimone, Andrew C. Porter, Michael S. Garet, Amy Stornaiuolo, Katie Pak, Toni M. Smith, Mengli Song, Nelson Flores, Lynn S. Fuchs, Douglas Fuchs, T. Philip Nichols.

Standards have been at the heart of state and federal efforts to improve education for several decades. Most recently, standards-based reforms have evolved with a focus on more ambitious "college- and career-ready" (CCR) standards. This paper synthesizes the results of a seven-year national research center focused on the implementation and effects of CCR standards. The paper draws on evidence from a quasi-experimental longitudinal study using NAEP data, a cluster-randomized trial of an alignment feedback intervention, and detailed implementation data from state-representative surveys and case studies of five districts. Situating our work in a "policy attributes theory," we find important gaps in the theory of change underlying current standards-based reform efforts. We conclude that the CCR standards movement is not succeeding in achieving its desired outcomes. We make specific suggestions for improving instructional policy, including a) providing more specific instructional guidance, b) reconceptualizing professional learning, c) building buy-in through the involvement of trusted leaders, d) providing better supports for differentiation, and e) devoting attention and guidance to the intersection of content and pedagogy, and f) addressing persistent deficit thinking among educators. 

Monnica Chan, Zachary Mabel, Preeya Pandya Mbekeani.

Performance-based funding models for higher education, which tie state support for institutions to performance on student outcomes, have proliferated in recent decades. Some states have designed these policies to also address educational attainment gaps by including bonus payments for traditionally low-performing groups. Using a Synthetic Control Method research design, we examine the impact of these funding regimes on race-based completion gaps in Tennessee and Ohio. We find no evidence that performance-based funding narrowed race-based completion gaps. In fact, contrary to their intended purpose, we find that performance-based funding widened existing gaps in certificate completion in Tennessee. Across both states, the estimated impacts on associate degree outcomes are also directionally consistent with performance-based funding exacerbating racial inequities in associate degree attainment.

Dan Goldhaber, Zeyu Jin, Richard Startz.

We present new estimates of the importance of teachers in early grades for later grade outcomes, but unlike the existing literature that examines teacher “fade-out,” we directly compare the contribution of early-grade teachers to later year outcomes against the contributions of later year teachers to the same later year outcomes. Where the prior literature finds that much of the contribution of early teachers fades away, we find that the contributions of early-year teachers remain important in later grades. The difference in contributions to eighth-grade outcomes between an effective and ineffective fourth-grade teacher is about half the difference among eighth-grade teachers. The effect on eighth-grade outcomes of replacing a fourth-grade teacher who is below the 5th percentile with a median teacher is about half the underrepresented minority (URM)/non-URM achievement gap. Our results reinforce earlier conclusions in the literature that teachers in all grades are important for student achievement.

Todd Pugatch, Paul Thompson.

Can public university honors programs deliver the benefits of selective undergraduate education within otherwise nonselective institutions? We evaluate the impact of admission to the Honors College at Oregon State University, a large nonselective public university. Admission to the Honors College depends heavily on a numerical application score. Nonlinearities in admissions probabilities as a function of this score allow us to compare applicants with similar scores, but different admissions outcomes, via a fuzzy regression kink design. The first stage is strong, with takeup of Honors College programming closely following nonlinearities in admissions probabilities. To estimate the causal effect of Honors College admission on human capital formation, we use these nonlinearities in the admissions function as instruments, combined with course-section fixed effects to account for strategic course selection. Honors College admission increases course grades by 0.10 grade points on the 0-4 scale, or 0.14 standard deviations. Effects are concentrated at the top of the course grade distribution. Previous exposure to Honors sections of courses in the same subject is a leading potential channel for increased grades. However, course grades of first-generation students decrease in response to Honors admission, driven by low performance in natural science courses. Results suggest that selective Honors programs can accelerate skill acquisition for high-achieving students at public universities, but not all students benefit from Honors admission.

Edward J. Kim.

This study introduces the signal weighted teacher value-added model (SW VAM), a value-added model that weights student-level observations based on each student’s capacity to signal their assigned teacher’s quality. Specifically, the model leverages the repeated appearance of a given student to estimate student reliability and sensitivity parameters, whereas traditional VAMs represent a special case where all students exhibit identical parameters. Simulation study results indicate that SW VAMs outperform traditional VAMs at recovering true teacher quality when the assumption of student parameter invariance is met but have mixed performance under alternative assumptions of the true data generating process depending on data availability and the choice of priors. Evidence using an empirical data set suggests that SW VAM and traditional VAM results may disagree meaningfully in practice. These findings suggest that SW VAMs have promising potential to recover true teacher value-added in practical applications and, as a version of value-added models that attends to student differences, can be used to test the validity of traditional VAM assumptions in empirical contexts.

Ana P. Cañedo, Paul T. von Hippel.

Von Hippel & Cañedo (2021) reported that US kindergarten teachers placed girls, Asian-Americans, and children from families of high socioeconomic status (SES) into higher ability groups than their test scores alone would warrant. The results fit the view that teachers were biased.

This comment asks whether parents’ lobbying for higher placement might explain these results. The answer, for the most part, is no. Measures of parent-teacher contact explained little variation in children’s ability group placement, and did not account for the higher placement of girls, Asian-Americans, or high-SES children. In fact, Asian-American parents had less teacher contact than did white children. It appears that the biases observed by von Hippel & Cañedo resided primarily in teachers, not in parents.

We also ask whether teachers who used more objective assessment techniques were less biased in placing children into higher and lower ability groups. The answer, again, was no. Unfortunately, biases persisted in the face of objective information about students’ skill. Fortunately, the biases were not terribly large.

