Welcome to prevention science meets sci-fi.

Prevention

Programs

Efficacy

Implementation

Author

Francisco Cardozo

Published

March 19, 2025

AI is knocking at the door of prevention science, carrying predictions instead of theories and probabilities instead of certainties. It’s exciting, slightly unsettling, and entirely unavoidable-welcome to efficacy’s existential crisis. ChatGPT 4.0

Prevention science has long been grounded in a stable architecture of evaluation. For decades, researchers have relied on established Standards of Evidence to determine whether interventions are efficacious, effective, and ready to be scaled. These criteria, first articulated by Flay and colleagues, emphasized randomized controlled trials (RCTs), replication, and theoretical clarity as the hallmarks of trustworthy interventions. A decade later, the Society for Prevention Research refined and expanded these standards, incorporating dimensions such as implementation quality, adaptation documentation, and contextual fit. These evolving guidelines built a shared language for what counts as evidence in prevention science, offering both clarity and continuity in an otherwise complex and variable field. They were not just methodological requirements-they were epistemological anchors.

But the clarity of that system is now encountering something much harder to categorize. Since the release of generative AI systems like ChatGPT, the boundary between intervention and evaluation has begun to blur. AI doesn’t merely observe or assess an intervention-it can become one. And unlike traditional programs with predefined modules, AI-driven tools can respond dynamically, shaping interventions in real-time based on each participant’s input. Imagine a participant initiating contact with a digital facilitator and, within seconds, receiving a personalized series of activities, reflections, or feedback loops-no two interactions the same, no fixed manual to follow. The program writes itself as it unfolds. And then rewrites itself again.

This new reality introduces a kind of productive chaos. Where prevention science has traditionally depended on standardization-of delivery, of dosage, of populations-AI introduces variability as a feature, not a bug. The gold-standard RCT, so central to establishing causal inference, assumes a stable treatment across units. But what happens when the treatment shifts with each user? What does fidelity mean when adaptation is the method, not the exception? The field now faces interventions that resist the very conditions under which traditional efficacy is measured. In such a landscape, many of our familiar assumptions-about replication, generalizability, even the boundaries of what an intervention is-begin to feel less like scientific necessities and more like historical artifacts.

This is not merely a methodological disruption; it is a conceptual one. The systems we’ve built to judge quality and effectiveness are designed for interventions that stay put. AI does not stay put. It learns, it adapts, and increasingly, it collaborates. Evaluation, as we’ve known it, is being asked to evaluate something that moves. And in doing so, it reveals its own limitations-not as a failure of science, but as a sign that science must now evolve to accommodate intelligence that is not only artificial, but creative.

It may be time to imagine a new framework for evidence something we might call Evidence 3.0. This does not mean abandoning rigor, but rather expanding its scope. New standards might require that AI-driven interventions meet performance benchmarks, undergo regular bias audits, and maintain transparent, open-source models. Simulation-based evidence might gain credibility, and real-time fidelity tracking could become the norm. In this context, evaluation itself becomes adaptive, built to monitor programs that do not repeat but respond.

What began as a stable, organized system is now moving into fluid territory. But the movement is not a collapse, it is a transformation. Prevention science remains anchored in its commitment to improving lives. That commitment just now has to stretch across a more dynamic, less predictable, and potentially more powerful set of tools. If the early decades of prevention science were about constructing a solid foundation, this moment is about learning to build on terrain that shifts beneath us. The questions we ask may stay the same, but the answers and the ways we find them are changing fast.