Search for EdWorkingPapers here by author, title, or keywords.
Methodology, measurement and data
Many kindergarten teachers place students in higher and lower “ability groups” to learn math and reading. Ability group placement should depend on student achievement, but critics charge that placement is biased by socioeconomic status (SES), gender, and race/ethnicity. We predict group placement in the Early Childhood Longitudinal Study of the Kindergarten class of 2010-11, using linear and ordinal regression models with classroom fixed effects. The best predictors of group placement are test scores, but girls, high-SES students, and Asian Americans receive higher placements than their test scores alone would predict. One third of students move groups during kindergarten, and some movement is predicted by changes in test scores, but high-SES students move up more than score gains would predict, and Hispanic children move up less. Net of SES and test scores, there is no bias in the placement of African American children. Differences in teacher-reported behaviors explain the higher placement of girls, but do little to explain the higher or lower placement of other groups. Although achievement is the best predictor of ability group placement, there are signs of bias.
This paper uses meta-analytic techniques to estimate the separate effects of the starting age, program duration, and persistence of impacts of early childhood education programs on children’s cognitive and achievement outcomes. It concentrates on studies published before the wide scale penetration of state-pre-K programs. Specifically, data are drawn from 67 high-quality evaluation studies conducted between 1960 and 2007, which provide 993 effect sizes for analyses. When weighted for differential precision, effect sizes averaged .26 sd at the end of these programs. We find larger effect sizes for programs starting in infancy/toddlerhood than in the preschool years and, surprisingly, smaller average effect sizes at the end of longer as opposed to shorter programs. Our findings suggest that, on average, impacts decline geometrically following program completion, losing nearly half of their size within one year after the end of treatment. Taken together, these findings reflect a moderate level of effectiveness across a wide range of center-based programs and underscore the need for innovative intervention strategies to produce larger and more persistent impacts.
Survey respondents use different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data. While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined. In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results. Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and model used.
A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. While there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs. Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches.
This report reviews findings from 35 major studies that speak to the question of principal turnover. Within these studies, researchers have examined principal turnover nationally and within states and districts, primarily investigating the relationships between principal turnover and various characteristics of principals, schools, students, and policies. While there is some consistency across studies, there is a good deal of variation in research questions, methods, and measurement of turnover. Further, few studies consider all the possible pathways out of the principalship, and few isolate the ways in which specific conditions or features of the principalship impact principals’ decisions to leave or districts’ decisions to retain principals. Despite these limitations, we found that, when examined together, these studies provided important information to help policymakers, education leaders, and other stakeholders understand and address principal turnover.
Much is known about how to attract, develop, and retain a strong and stable teacher workforce, and states across the country are taking action to address their teacher shortages in ways that strengthen their overall teacher workforce. This report highlights research on six evidence-based policies that have been used to address teacher shortages and boost teacher recruitment and retention: service scholarships and loan forgiveness, high-retention pathways into teaching, mentoring and induction for new teachers, developing high-quality school principals, competitive compensation, and recruitment policies to expand the pool of qualified educators.
Research showing that high-quality preschool benefits children’s early learning and later life outcomes has led to increased state engagement in public preschool. However, mixed results from evaluations of two programs—Tennessee’s Voluntary Pre-K program and Head Start—have left many policymakers unsure about how to ensure productive investments. This report presents the most rigorous evidence on the effects of preschool and clarifies how the findings from Tennessee and Head Start relate to the larger body of research showing that high-quality preschool enhances children’s school readiness by supporting substantial early learning gains in comparison to children who do not experience preschool and can have lasting impacts far into children’s later years of school and life. Therefore, the issue is not whether preschool “works,” but how to design and implement programs that ensure public preschool investments consistently deliver on their promise.
Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice.
Despite wide achievement gaps across California between students from different racial and socioeconomic backgrounds, some school districts have excelled at supporting the learning of all their students. This analysis identifies these positive outlier districts—those in which students of color, as well as White students, consistently achieve at higher levels than students from similar racial/ethnic backgrounds and from families of similar income and education levels in most other districts. These results are predicted, in significant part, by the qualifications of districts’ teachers, as measured by their certification and experience. In particular, the proportion of underprepared teachers—those teaching on emergency permits, waivers, and intern credentials—is associated with decreased achievement for all students, while teaching experience is associated with increased achievement, especially for students of color.
We show that grit, a skill that has been shown to be highly predictive of achievement, is malleable in childhood and can be fostered in the classroom environment. We evaluate a randomized educational intervention implemented in two independent elementary school samples. Outcomes are measured via a novel incentivized real effort task and performance in standardized tests. We find that treated students are more likely to exert effort to accumulate task-specific ability, and hence, more likely to succeed. In a follow up 2.5 years after the intervention, we estimate an effect of about 0.2 standard deviations on a standardized math test.