Search for EdWorkingPapers here by author, title, or keywords.
Standards, accountability, assessment, and curriculum
A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. While there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs. Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches.
Survey respondents use different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data. While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined. In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results. Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and model used.
One of the mysteries of education reform is how leaders and educators can successfully instantiate, sustain, and spread student-centered pedagogical practices from a few schools to many others. Advocates for deeper learning grapple with this mystery as they seek to transform teaching and learning to prepare students to meet the demands of the 21st century and to close the opportunity gap between advantaged and disadvantaged groups. While research suggests that deeper learning strategies that support critical thinking and problem-solving can yield improved student outcomes, implementing these strategies is not easy, as they require reimagining school environments and changing traditional approaches to teaching. This report highlights how three networks of schools engaged in deeper learning have managed this feat. It describes the systems and structures the networks have used to instantiate their equitable deeper learning models in diverse public school settings to serve students in more personalized and productive ways.
Research showing that high-quality preschool benefits children’s early learning and later life outcomes has led to increased state engagement in public preschool. However, mixed results from evaluations of two programs—Tennessee’s Voluntary Pre-K program and Head Start—have left many policymakers unsure about how to ensure productive investments. This report presents the most rigorous evidence on the effects of preschool and clarifies how the findings from Tennessee and Head Start relate to the larger body of research showing that high-quality preschool enhances children’s school readiness by supporting substantial early learning gains in comparison to children who do not experience preschool and can have lasting impacts far into children’s later years of school and life. Therefore, the issue is not whether preschool “works,” but how to design and implement programs that ensure public preschool investments consistently deliver on their promise.
Although there is considerable research on the elements of high-quality preschool and its many benefits, particularly for low-income children and English learners, little information is available to policymakers about how to convert their visions of good early education into on-the-ground reality. This study fills that gap by describing and analyzing how four states—Michigan, West Virginia, Washington, and North Carolina—have built high-quality early education systems. Among the common elements of their success are strategies that prioritize quality and continuous improvement, invest in training and coaching for program staff, coordinate the administration of birth-through-grade-3 programs, strategically combine multiple funding sources to increase access and improve quality, and create broad-based coalitions and support.
While school choice may enhance competition, incentives for public schools to raise productivity may be muted if public education is viewed as imperfectly substitutable with alternatives. This paper estimates the aggregate effect of charter school expansion on education quality while accounting for the horizontal differentiation of charter school programs. To do so, we combine student-level administrative data with novel information about the educational programs of charter schools that opened in North Carolina following the removal of the statewide cap in 2011. The dataset contains students' standardized test scores as well as geocoded residential addresses, which allow us to compare the test score changes of students who lived near the new charters prior to the policy change with those for students who lived farther away. We apply this research design to estimate separate treatment effects for exposure to charter schools that are and are not differentiated horizontally from public school instruction. The results indicate learning gains for treated students that are driven entirely by non-horizontally differentiated charter schools: we find that non-horizontally differentiated charter school expansion causes a 0.05 SD increase in math scores. These learning gains are driven by public schools responding to increased competition.
Despite wide achievement gaps across California between students from different racial and socioeconomic backgrounds, some school districts have excelled at supporting the learning of all their students. This analysis identifies these positive outlier districts—those in which students of color, as well as White students, consistently achieve at higher levels than students from similar racial/ethnic backgrounds and from families of similar income and education levels in most other districts. These results are predicted, in significant part, by the qualifications of districts’ teachers, as measured by their certification and experience. In particular, the proportion of underprepared teachers—those teaching on emergency permits, waivers, and intern credentials—is associated with decreased achievement for all students, while teaching experience is associated with increased achievement, especially for students of color.
Teacher evaluation policies seek to improve student outcomes by increasing the effort and skill levels of current and future teachers. Current policy and most prior research treats teacher evaluation as balancing two aims: accountability and growth. Proper teacher evaluation design has been understood as successfully weighting the accountability and growth dimensions of policy and practice. I detail six assumptions underlying teacher evaluation for growth and accountability and assess their reasonableness in light of empirical evidence from the personnel economics, social psychology and management literatures. I simulate a set of teacher evaluation policies and find that those that treat evaluation for accountability and evaluation for growth as substitutes modestly outperform policies that treat them as complements. The teachers’ rates of learning through evaluation and the labor market effects of evaluation are critical in determining its impact. I conclude with recommendations for the design of teacher evaluation policies.
Despite frequent political and policy debates, the effects of imposing accountability pressures on public school teachers are empirically indeterminate. In this paper, we study the effects of accountability in the context of teacher responses to student behavioral infractions in the aftermath of teacher evaluation reforms. We leverage cross-state variation in the timing of state policy implementation to estimate whether teachers change the rate at which they remove students from their classrooms. We find that higher-stakes teacher evaluation had no causal effect on the rates of disciplinary referrals, and we find no evidence of heterogeneous effects for grades subject to greater accountability pressures or in schools facing differing levels of disciplinary infractions. Our results are precisely estimated and robust to a battery of specification checks. Our findings provide insights on the effects of accountability policy on the black-box of classroom practice and highlight the loose-coupling of education policy and teacher behaviors.
Despite large schooling and learning gains in many developing countries, children in highly deprived areas are often unlikely to achieve even basic literacy and numeracy. We study how much of this problem can be resolved using a multi-pronged intervention combining several distinct interventions known to be effective in isolation. We conducted a cluster-randomized trial in The Gambia evaluating a literacy and numeracy intervention designed for primary-aged children in remote parts of poor countries. The intervention combines para teachers delivering after-school supplementary classes, scripted lesson plans, and frequent monitoring focusing on improving teacher practice (coaching). A similar intervention previously demonstrated large learning gains in a cluster-randomized trial in rural India. After three academic years, Gambian children receiving the intervention scored 46 percentage points (3.2 SD) better on a combined literacy and numeracy test than control children. This intervention holds great promise to address low learning levels in other poor, remote settings.