Search for EdWorkingPapers here by author, title, or keywords.
Educator preparation, professional development, performance and evaluation
At least sixteen US states have taken steps toward holding teacher preparation programs (TPPs) accountable for teacher value-added to student test scores. Yet it is unclear whether teacher quality differences between TPPs are large enough to make an accountability system worthwhile. Several statistical practices can make differences between TPPs appear larger and more significant than they are. We reanalyze TPP evaluations from 6 states—New York, Louisiana, Missouri, Washington, Texas, and Florida—using appropriate methods implemented by our new caterpillar command for Stata. Our results show that teacher quality differences between most TPPs are negligible—.01-.03 standard deviations in student test scores—even in states where larger differences were reported previously. While ranking all a state’s TPPs may not be possible or desirable, in some states and subjects we can find a single TPP whose teachers stand out as significantly above or below average. Such exceptional TPPs may reward further study.
For nearly three decades, policy-makers and researchers in the United States have promoted more intellectually rigorous standards for mathematics teaching and learning. Yet, to date, we have limited descriptive evidence on the extent to which reform-oriented instruction has been enacted at scale.
The purpose of the study is to examine the prevalence of reform-aligned mathematics instructional practices in five U.S. school districts. We also seek to describe the range of instruction students experience by presenting case studies of teachers at high, medium and low levels of reform alignment.
We draw on 1,735 video-recorded lessons from 329 elementary teachers in these five U.S. urban districts.
We present descriptive analyses of lesson scores on a mathematics-focused classroom observation instrument. We also draw upon interviews with district personnel, rater-written lesson summaries, and lesson video in order to develop case studies of instructional practice.
We find that teachers in our sample do use reform-aligned instructional practices, but that they do so within the confines of traditional lesson formats. We also find that the implementation of these instructional practices varies in quality. Furthermore, the prevalence and strength of these practices corresponds to the coherence of district efforts at instructional reform.
Our findings suggest that unlike other studies in which reform-oriented instruction rarely occurred (e.g. Kane & Staiger, 2012), reform practices do appear to some degree in study classrooms. In addition, our analyses suggest that implementation of these reform practices corresponds to the strength and coherence of district efforts to change instruction.
Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
Researchers are rarely satisfied to learn only whether an intervention works, they also want to understand why and under what circumstances interventions produce their intended effects. These questions have led to increasing calls for implementation research to be included in high quality studies with strong causal claims. Of critical importance is determining whether an intervention can be delivered with adherence to a standardized protocol, and the extent to which an intervention protocol can be replicated across sessions, sites, and studies. When an intervention protocol is highly standardized and delivered through verbal interactions with participants, a set of natural language processing (NLP) techniques termed semantic similarity can be used to provide quantitative summary measures of how closely intervention sessions adhere to a standardized protocol, as well as how consistently the protocol is replicated across sessions. Given the intense methodological, budgetary and logistical challenges for conducting implementation research, semantic similarity approaches have the benefit of being low-cost, scalable, and context agnostic for use. In this paper, we demonstrate how semantic similarity approaches may be utilized in an experimental evaluation of a coaching protocol on teacher pedagogical skills in a simulated classroom environment. We discuss strengths and limitations of the approach, and the most appropriate contexts for applying this method.
We study the adoption and implementation of a new mobile communication app among a sample of 132 New York City public schools. The app provides a platform for sharing general announcements and news as well as engaging in personalized two-way communication with individual parents. We provide participating schools with free access to the app and randomize schools to receive intensive support (training, guidance, monitoring, and encouragement) for maximizing the efficacy of the app. Although user supports led to higher levels of communication within the app in the treatment year, overall usage remained low and declined in the following year when treatment schools no longer received intensive supports. We find few subsequent effects on perceptions of communication quality or student outcomes. We leverage rich internal user data to explore how take-up and usage patterns varied across staff and school characteristics. These analyses help to identify early adopters and reluctant users, revealing both opportunities and obstacles to engaging parents through new communication technology.
This article takes stock of where the field of behavioral science applied to education policy seems to be at, which avenues seem promising and which ones seem like dead ends. I present a curated set of studies rather than an exhaustive literature review, categorizing interventions by whether they nudge (keep options intact) or “shove” (restrict choice), and whether they apply a high or low touch (whether they use face-to-face interaction or not). Many recent attempts to test large-scale low touch nudges find precisely estimated null effects, suggesting we should not expect letters, text messages, and online exercises to serve as panaceas for addressing education policy’s key challenges. Programs that impose more choice-limiting structure to a youth’s routine, like mandated tutoring, or programs that nudge parents, appear more promising.
COVID-19 shuttered schools across the United States, upending traditional approaches to education. We examine teachers’ experiences during emergency remote teaching in the spring of 2020 using responses to a working conditions survey from a sample of 7,841 teachers across 206 schools and 9 states. Teachers reported a range of challenges related to engaging students in remote learning and balancing their professional and personal responsibilities. Teachers in high-poverty and majority Black schools perceived these challenges to be the most severe, suggesting the pandemic further increased existing educational inequities. Using data from both pre-post and retrospective surveys, we find that the pandemic and pivot to emergency remote teaching resulted in a sudden, large drop in teachers’ sense of success. We also demonstrate how supportive working conditions in schools played a critical role in helping teachers to sustain their sense of success. Teachers who could depend on their district and school-based leadership for strong communication, targeted training, meaningful collaboration, fair expectations, and recognition of their efforts were least likely to experience declines in their sense of success.
High rates of principal turnover nationally mean that school districts constantly are called on to recruit and select new principals. The importance of a school’s principal makes choosing candidates who will be effective paramount, yet we have little evidence linking information known to school districts at time of selection to principal’s future job performance. Using data from Tennessee, we test the degree to which observable information about novice principals from prior to entry, including qualifications, work history information, and effectiveness in prior roles, predicts practice ratings assigned to them in their initial years in the principalship. We find that educational attainment and years of experience in other jobs hold little predictive power. Performance ratings received as an assistant principal (AP) or teacher, however, do predict principal effectiveness. Moreover, APs who previously worked in schools with highly rated principals are more likely to be effective upon transitioning into the principalship.
Assistant principals are important education personnel, both as essential members of school leadership teams and apprentice principals. However, empirical evidence on their career outcomes remains scarce. Using statewide administrative data from Tennessee and Missouri, we provide the first comprehensive analysis of AP mobility. While prior work focuses only on AP promotions into principal positions, we also account for APs who exit school leadership and transfer to a different school. We find yearly mobility rates of 25–28%, with 10% of APs leaving school leadership, 7.5% changing schools, and 7.5–10% becoming principals. We also document a strong relationship between AP mobility and principal turnover, where higher-performing APs are substantially more likely to replace their departing principal. Principal transitions also appear to increase the likelihood that APs exit school leadership and change schools, highlighting an additional cost of high rates of principal churn.
Teacher evaluation policies seek to improve student outcomes by increasing the effort and skill levels of current and future teachers. Current policy and most prior research treats teacher evaluation as balancing two aims: accountability and skill development. Proper teacher evaluation design has been understood as successfully weighting the accountability and professional growth dimensions of policy and practice. I develop a model of teacher effectiveness that incorporates improvement from evaluation and detail conditions which determine the effectiveness of teacher evaluation for growth and accountability at improving student outcomes. Drawing on empirical evidence from the personnel economics, economics of education and measurement literatures, I simulate the long-term effects of a set of teacher evaluation policies. I find that those that treat evaluation for accountability and evaluation for growth as substitutes outperform policies that treat them as complements. I conclude that optimal teacher evaluation policies would impose accountability on teachers performing below a defined level and above which teachers would be subject to no accountability pressure but would receive intensive instructional supports.