Search for EdWorkingPapers here by author, title, or keywords.
A vast research literature documents racial bias in teachers’ evaluations of students. Theory suggests bias may be larger on grading scales with vague or overly-general criteria versus scales with clearly-specified criteria, raising the possibility that well-designed grading policies may mitigate bias. This study offers relevant evidence through a randomized web-based experiment with 1,549 teachers. On a vague grade-level evaluation scale, teachers rated a student writing sample lower when it was randomly signaled to have a Black author, versus a White author. However, there was no evidence of racial bias when teachers used a rubric with more clearly-defined evaluation criteria. Contrary to expectation, I found no evidence that the magnitude of grading bias depends on teachers’ implicit or explicit racial attitudes.
This study examines the relationship between county-level estimates of implicit racial bias and black-white test score gaps in U.S. schools. Data from over 1 million respondents from across the United States who completed an online version of the Race Implicit Association Test (IAT) were combined with data from the Stanford Education Data Archive covering over 300 million test scores from U.S. schoolchildren in grades 3 through 8. Two key findings emerged. First, in both bivariate and multivariate models, counties with higher levels of racial bias had larger black-white test score disparities. The magnitude of these associations were on par with other widely accepted predictors of racial test score gaps, including racial gaps in family income and racial gaps in single parenthood. Second, the observed relationship between collective rates of racial bias and racial test score gaps was explained by the fact that counties with higher rates of racial bias had schools that were characterized by more racial segregation and larger racial gaps in gifted and talented assignment as well as special education placement. This pattern is consistent with a theoretical model in which aggregate rates of racial bias affect educational opportunity through sorting mechanisms that operate both within and beyond schools.
We study racial bias and the persistence of first impressions in the context of education. Teachers who begin their careers in classrooms with large black-white score gaps carry negative views into evaluations of future cohorts of black students. Our evidence is based on novel data on blind evaluations and non-blind public school teacher assessments of fourth and fifth graders in North Carolina. Negative first impressions lead teachers to be significantly less likely to over-rate but not more likely to under-rate black students’ math and reading skills relative to their white classmates. Teachers' perceptions are sensitive to the lowest-performing black students in early classrooms, but non-responsive to highest-performing ones. This is consistent with the operation of confirmatory biases. Since teacher expectations can shape grading patterns and sorting into academic tracks as well as students’ own beliefs and behaviors, these findings suggest that novice teacher initial experiences may contribute to the persistence of racial gaps in educational achievement and attainment.
Valid and reliable measurements of teaching quality facilitate school-level decision-making and policies pertaining to teachers, but conventional classroom observations are costly, prone to rater bias, and hard to implement at scale. Using nearly 1,000 word-to-word transcriptions of 4th- and 5th-grade English language arts classes, we apply novel text-as-data methods to develop automated, objective measures of teaching to complement classroom observations. This approach is free of rater bias and enables the detection of three instructional factors that are well aligned with commonly used observation protocols: classroom management, interactive instruction, and teacher-centered instruction. The teacher-centered instruction factor is a consistent negative predictor of value-added scores, even after controlling for teachers’ average classroom observation scores. The interactive instruction factor predicts positive value-added scores.
States and districts are increasingly incorporating measures of achievement growth into their school accountability systems, but there is little research on how these changes affect the public’s perceptions of school quality. We conduct a nationally representative online survey experiment to identify the effects of providing participants with information about their local school districts’ average achievement status and/or average achievement growth. In the control group, participants who live in higher status districts tend to grade their local schools more favorably. The provision of status information does not fundamentally alter this relationship. The provision of growth information, however, reshapes Americans’ views about educational performance. Once informed, participants’ evaluations of their local public schools better reflect the variation in district growth.
The “achievement gap” has long dominated mainstream conversations about race and education. Some scholars warn that the discourse around racial gaps perpetuates stereotypes and promotes the adoption of deficit-based explanations that fail to appreciate the role of structural inequities. I investigate through three randomized experiments. Results indicate that a TV news story about racial achievement gaps (versus a control or counter-stereotypical video) led viewers to express more exaggerated stereotypes of Black Americans as lacking education (study 1: ES=.30 SD; study 2: ES=.38 SD) and may have increased viewers’ implicit stereotyping of Black students as less competent than White students (study 1: ES=.22 SD; study 2: ES=.12 SD, n.s.). The video did not affect viewers’ explicit competence-related racial stereotyping, the explanations they gave for achievement inequalities, or their prioritization of ending achievement inequalities. After two weeks, the effect on stereotype exaggeration faded. Future research should probe how we can most productively frame educational inequality by race.
Nearly one in five U.S. students attends a rural school, yet we know very little about achievement gaps and academic growth in rural schools. This study leverages a unique dataset that includes longitudinal test scores for more than five million 3rd to 8th grade students in approximately 17,000 public schools across the 50 states, including 900,000 students attending 4,727 rural schools. We find rural achievement and growth to be slightly above public schools. But there is considerable heterogeneity by student race/ethnicity. For all grades and subjects, White-Black and White-Hispanic gaps are smaller in rural schools than gaps nationwide, and White-Native American gaps are larger in rural schools than gaps nationwide. Separate analyses by racial/ethnic subgroup show that rural Black, Hispanic, and Native American students are often growing slower than their respective subgroup national average. In contrast, White students are often growing faster than the national average for White students.
Clustered observational studies (COSs) are a critical analytic tool for educational effectiveness research. We present a design framework for the development and critique of COSs. The framework is built on the counterfactual model for causal inference and promotes the concept of designing COSs that emulate the targeted randomized trial that would have been conducted were it feasible. We emphasize the key role of understanding the assignment mechanism to study design. We review methods for statistical adjustment and highlight a recently developed form of matching designed specifically for COSs. We review how regression models can be profitably combined with matching and note best practice for estimates of statistical uncertainty. Finally, we review how sensitivity analyses can determine whether conclusions are sensitive to bias from potential unobserved confounders. We demonstrate concepts with an evaluation of a summer school reading intervention in Wake County, North Carolina.
Summer learning loss (SLL) is a familiar and much-studied phenomenon, yet new concerns that measurement artifacts distorted canonical SLL findings create a need to revisit basic research on SLL. Though race/ethnicity and SES only account for about 4% of the variance in SLL, nearly all prior work focuses on these factors. We zoom out to the full spread of differential SLL and its contribution to students’ positions in the eighth grade achievement distribution. Using a large, longitudinal Northwest Evaluation Association dataset, we document dramatic variability in SLL. While some students actually maintain their school-year learning rate, others lose nearly all their school-year progress. Moreover, decrements are not randomly distributed—52% of students lose ground in all 5 consecutive years (ELA).
Many interventions in education occur in settings where treatments are applied to groups. For example, a reading intervention may be implemented for all students in some schools and withheld from students in other schools. When such treatments are non-randomly allocated, outcomes across the treated and control groups may differ due to the treatment or due to baseline differences between groups. When this is the case, researchers can use statistical adjustment to make treated and control groups similar in terms of observed characteristics. Recent work in statistics has developed matching methods designed for contexts where treatments are clustered. This form of matching, known as multilevel matching, may be well suited to many education applications where treatments are assigned to schools. In this article, we provide an extensive evaluation of multilevel matching and compare it to multilevel regression modeling. We evaluate multilevel matching methods in two ways. First, we use these matching methods to recover treatment effect estimates from three clustered randomized trials using a within-study comparison design. Second, we conduct a simulation study. We find evidence that generally favors an analytic approach to statistical adjustment that combines multilevel matching with regression adjustment. We conclude with an empirical application.