Search for EdWorkingPapers here by author, title, or keywords.
Methodology, measurement and data
Using rich longitudinal data from one of the largest teacher education programs in Texas, we examine the measurement of pre-service teacher (PST) quality and its relationship with entry into the K–12 public school teacher workforce. Drawing on rubric-based observations of PSTs during clinical teaching, we find that little of the variation in observation scores is attributable to actual differences between PSTs. Instead, differences in scores largely reflect differences in the rating standards of field supervisors. We also find that men and PSTs of color receive systematically lower scores. Finally, higher-scoring PSTs are slightly more likely to enter the teacher workforce and substantially more likely to be hired at the same school as their clinical teaching placement.
Important educational policy decisions, like whether to shorten or extend the school year, often require accurate estimates of how much students learn during the year. Yet, related research relies on a mostly untested assumption: that growth in achievement is linear throughout the entire school year. We examine this assumption using a data set containing math and reading test scores for over seven million students in kindergarten through 8th grade across the fall, winter, and spring of the 2016-17 school year. Our results indicate that assuming linear within-year growth is often not justified, particularly in reading. Implications for investments in extending the school year, summer learning loss, and racial/ethnic achievement gaps are discussed.
High school graduation rates have increased dramatically in the past two decades. Some skepticism has arisen, however, because of the confluence of the graduation rise and the starts of high-stakes accountability for graduation rates with No Child Left Behind (NCLB). In this study we provide some of the first evidence about the role of accountability versus strategic behavior, especially the degree to which the recent graduation rate rise represents increased human capital. First, using national DD analysis of within-state, cross-district variation in proximity to state graduation rate thresholds, we confirm that NCLB accountability increased graduation rates. However, we find limited evidence that this is due to strategic behavior. To test for lowering of graduation standards, we examined graduation rates in states with and without graduation exams and trends in GEDs; neither analysis suggests that the graduation rate rise is due to strategic behavior. We also examined the effects of “credit recovery” courses using Louisiana micro data; while our results suggest an increase in credit recovery, consistent with some lowering of standards, the size of the effect is not nearly enough to explain the rise in graduation rates. Finally, we examine other forms of strategic behavior by schools, though these can only explain inflation of school/district-level graduation rates, not rational rates. Overall, the evidence suggests that the rise in the national graduation rates reflects some strategic behavior, but also a substantial increase in the nation’s stock of human capital. Graduation accountability was a key contributor.
This paper uses meta-analytic techniques to estimate the separate effects of the starting age, program duration, and persistence of impacts of early childhood education programs on children’s cognitive and achievement outcomes. It concentrates on studies published before the wide scale penetration of state-pre-K programs. Specifically, data are drawn from 67 high-quality evaluation studies conducted between 1960 and 2007, which provide 993 effect sizes for analyses. When weighted for differential precision, effect sizes averaged .26 sd at the end of these programs. We find larger effect sizes for programs starting in infancy/toddlerhood than in the preschool years and, surprisingly, smaller average effect sizes at the end of longer as opposed to shorter programs. Our findings suggest that, on average, impacts decline geometrically following program completion, losing nearly half of their size within one year after the end of treatment. Taken together, these findings reflect a moderate level of effectiveness across a wide range of center-based programs and underscore the need for innovative intervention strategies to produce larger and more persistent impacts.
Survey respondents use different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data. While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined. In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results. Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and model used.
A huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods. While there is great deal of guidance on scaling and linking IRT-based large-scale educational assessment to facilitate the estimation of examinee growth, little of this expertise is brought to bear in the scaling of psychological and social-emotional constructs. Through a series of simulation and empirical studies, we produce scores in a single-cohort repeated measure design using sum scores as well as multiple IRT approaches and compare the recovery of growth estimates from longitudinal growth models using each set of scores. Results indicate that using scores from multidimensional IRT approaches that account for latent variable covariances over time in growth models leads to better recovery of growth parameters relative to models using sum scores and other IRT approaches.
This report reviews findings from 35 major studies that speak to the question of principal turnover. Within these studies, researchers have examined principal turnover nationally and within states and districts, primarily investigating the relationships between principal turnover and various characteristics of principals, schools, students, and policies. While there is some consistency across studies, there is a good deal of variation in research questions, methods, and measurement of turnover. Further, few studies consider all the possible pathways out of the principalship, and few isolate the ways in which specific conditions or features of the principalship impact principals’ decisions to leave or districts’ decisions to retain principals. Despite these limitations, we found that, when examined together, these studies provided important information to help policymakers, education leaders, and other stakeholders understand and address principal turnover.
Much is known about how to attract, develop, and retain a strong and stable teacher workforce, and states across the country are taking action to address their teacher shortages in ways that strengthen their overall teacher workforce. This report highlights research on six evidence-based policies that have been used to address teacher shortages and boost teacher recruitment and retention: service scholarships and loan forgiveness, high-retention pathways into teaching, mentoring and induction for new teachers, developing high-quality school principals, competitive compensation, and recruitment policies to expand the pool of qualified educators.
Research showing that high-quality preschool benefits children’s early learning and later life outcomes has led to increased state engagement in public preschool. However, mixed results from evaluations of two programs—Tennessee’s Voluntary Pre-K program and Head Start—have left many policymakers unsure about how to ensure productive investments. This report presents the most rigorous evidence on the effects of preschool and clarifies how the findings from Tennessee and Head Start relate to the larger body of research showing that high-quality preschool enhances children’s school readiness by supporting substantial early learning gains in comparison to children who do not experience preschool and can have lasting impacts far into children’s later years of school and life. Therefore, the issue is not whether preschool “works,” but how to design and implement programs that ensure public preschool investments consistently deliver on their promise.
Estimates of teacher “value-added” suggest teachers vary substantially in their ability to promote student learning. Prompted by this finding, many states and school districts have adopted value-added measures as indicators of teacher job performance. In this paper, we conduct a new test of the validity of value-added models. Using administrative student data from New York City, we apply commonly estimated value-added models to an outcome teachers cannot plausibly affect: student height. We find the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement, raising obvious questions about validity. Subsequent analysis finds these “effects” are largely spurious variation (noise), rather than bias resulting from sorting on unobserved factors related to achievement. Given the difficulty of differentiating signal from noise in real-world teacher effect estimates, this paper serves as a cautionary tale for their use in practice.