Over the last twenty years, education researchers have increasingly conducted randomised experiments with the goal of informing the decisions of educators and policymakers. Such experiments have generally employed broad, consequential, standardised outcome measures in the hope that this would allow decisionmakers to compare effectiveness of different approaches. However, a combination of small effect sizes, wide confidence intervals, and treatment effect heterogeneity means that researchers have largely failed to achieve this goal. We argue that quasiexperimental methods and multi-site trials will often be superior for informing educators’ decisions on the grounds that they can achieve greater precision and better address heterogeneity. Experimental research remains valuable in applied education research. However, it should primarily be used to test theoretical models, which can in turn inform educators’ mental models, rather than attempting to directly inform decision making. Since comparable effect size estimates are not of interest when testing educational theory, researchers can and should improve the power of theory-informing experiments by using more closely aligned (i.e., valid) outcome measures. We argue that this approach would reduce wasteful research spending and make the research that does go ahead more statistically informative, thus improving the return on investment in educational research.