How the Replication Crisis Reshaped Psychology
When landmark behavioral studies like "power posing" and "ego depletion" failed to replicate in rigorous follow-up experiments, it triggered a painful reckoning known as the replication crisis. Uncovering these flawed research practices did not destroy the behavioral sciences, but rather exposed the dangers of small sample sizes and selective reporting. Ultimately, this crisis catalyzed a global "credibility revolution" focused on open data, pre-registered methodologies, and transparent international collaboration.
The Allure of Simple Solutions to Complex Behaviors
In the late 1990s and early 2010s, behavioral science experienced a golden age of public attention and media saturation. Researchers were discovering seemingly profound, highly intuitive insights about human nature that promised easy interventions for everyday problems. The appeal was obvious: if complex human struggles with self-control, confidence, and success could be hacked with minor behavioral tweaks, the implications for self-improvement were limitless. Two of the most famous phenomena to emerge from this era were ego depletion and power posing. Both concepts were backed by peer-reviewed literature, both made intuitive sense, and both promised massive personal benefits from minimal effort.
Ego Depletion: Is Willpower a Muscle?
The concept of ego depletion was introduced to the world in a 1998 paper by social psychologists Roy Baumeister, Ellen Bratslavsky, Mark Muraven, and Dianne Tice 12. The researchers proposed a highly relatable "strength model" of self-control. They posited that willpower operates much like a physical muscle: it draws upon a limited pool of conscious mental energy or resources 1. When that energy is exhausted by repeated use - a state they termed "ego depletion" - the muscle fatigues, and an individual's capacity to exert self-control on subsequent, unrelated tasks is severely impaired 12.
In a now-famous experiment, Baumeister and colleagues brought hungry participants into a laboratory that smelled of freshly baked chocolate chip cookies. Some participants were allowed to eat the cookies, while others were forced to exert extreme self-control by eating raw radishes instead 2. Afterward, all participants were asked to solve a puzzle that, unbeknownst to them, was actually impossible. The researchers found that the participants who had exhausted their willpower resisting the cookies quit the puzzle much faster than those who had been allowed to indulge 2.
The idea that self-control is a finite, depletable resource resonated deeply with both the public and the academic community. It seemingly explained everything from why we break our diets at the end of a stressful workday to why consumers make impulsive purchases 1. The study initiated a massive wave of research spanning consumer behavior, dieting, and athletic performance, and for over a decade, the ego-depletion effect was considered a foundational truth of human psychology 13.
Power Posing: Faking It Till You Make It
A little over a decade later, another blockbuster psychological concept arrived: power posing. In a 2010 paper published in the journal Psychological Science, researchers Dana Carney, Amy Cuddy, and Andy Yap claimed that briefly adopting expansive, "high-power" physical postures could fundamentally alter a person's neuroendocrinology and behavior 45.
The study, based on a remarkably small sample of 42 participants, reported that individuals who held expansive poses (like leaning back with hands behind the head, or standing with hands on hips like Wonder Woman) for just two minutes showed measurable physiological changes. Specifically, they exhibited an increase in testosterone (a hormone associated with dominance), a decrease in cortisol (a hormone associated with stress), and an increased willingness to take risks in a gambling task compared to participants who held contractive, "low-power" poses 5.
The researchers framed this as a powerful "life hack" with immediate real-world applications for high-stakes situations like job interviews and public speaking 45. The concept exploded into the mainstream following a 2012 TED talk by Amy Cuddy. Driven by her passionate delivery and the memorable mantra "fake it till you become it," the presentation became one of the most viewed in TED's history 46.
For years, both theories were treated as established scientific fact. Hundreds of subsequent studies appeared to build upon these foundations, creating vast literatures of conceptual replications and extensions 14. However, beneath the surface of these celebrated findings, a methodological crisis was quietly brewing, one that would soon threaten to upend the entire discipline.
The Replication Crisis Hits Psychology
Historically, the reproducibility of empirical results has been the ultimate cornerstone of the scientific method 6. If a phenomenon is real and robust, an independent laboratory following the exact same procedures should be able to observe it. In the early 2010s, facing growing skepticism about the statistical validity of "flashy" behavioral research, psychologists began undertaking large-scale, systematic efforts to replicate classic studies 67.
The results of these audits were deeply alarming. In 2015, the Open Science Collaboration published a landmark project attempting to replicate 100 published psychological studies. They found that only 36 percent of the original significant findings could be successfully reproduced 89. Furthermore, among the studies that did replicate, the effect sizes in the replications were, on average, half the magnitude of the originals 6. This widespread failure to reproduce published scientific results became known as the "replication crisis" 611.
The pillars of both ego depletion and power posing quickly buckled under this new wave of rigorous scrutiny.

The Collapse of the Willpower Muscle
For ego depletion, the first major blow came in the form of independent scrutiny of the underlying statistics. While a 2010 meta-analysis of 83 studies by Martin Hagger and Nikos Chatzisrantis had previously reported a moderate effect size (d = 0.62) for ego depletion, researchers like Evan Carter and Michael McCullough soon pointed out that this literature was likely plagued by severe publication bias 13.
To definitively test the effect, Hagger and Chatzisrantis subsequently organized a high-profile, pre-registered replication study involving 23 independent laboratories. The results were devastating: across the massive combined sample, they found zero evidence of an ego-depletion effect 13. Hoping to settle the debate with an even more robust test, a subsequent multi-lab replication project led by Kathleen Vohs involved 36 distinct laboratories and tested 3,531 participants 1. This massive undertaking also failed to find a meaningful ego-depletion effect, returning an effect size of d = 0.06 - a result an order of magnitude smaller than the original estimates and practically indistinguishable from zero 1.
In light of the sheer volume of failed replications, many researchers concluded that the ego-depletion effect as a universal physiological phenomenon might simply not exist, representing a massive false-positive generated by a flawed scientific culture 312.
The Deflation of Power Posing
Power posing faced a nearly identical trajectory. In 2015, a research team led by Eva Ranehill attempted a direct conceptual replication of the original 2010 Carney, Cuddy, and Yap study 810. Ranehill's team used a significantly larger sample size of 200 participants and utilized rigorous, computerized procedures to eliminate any potential experimenter bias or demand characteristics 1011.
The Ranehill replication found absolutely no support for the original biological or behavioral effects 810. While participants did self-report feeling more powerful after striking a pose - a subjective measure highly susceptible to placebo effects and the participants' own assumptions about the experiment - there were no significant changes in testosterone, cortisol, or objective risk-taking behavior 1012.
The evidence against the physiological claims of power posing quickly compounded as multiple other labs failed to replicate the hormonal shifts 813. In 2016, Dana Carney, the lead author of the original 2010 study, posted a public statement completely abandoning the theory. She explicitly stated she no longer believed that power poses produced the hormonal or behavioral effects they had originally claimed, leaving Amy Cuddy as the sole visible proponent of the theory in the public sphere 45.
How Did the Original Studies Get It So Wrong?
For the general public, the abrupt reversal of seemingly established science was profoundly confusing. If the original papers were peer-reviewed and published in prestigious journals by researchers at Ivy League universities, how could they be entirely wrong?
The answer lies in the statistical and structural norms of psychological research during that era. In the vast majority of cases, researchers were not committing intentional, malicious fraud. Rather, they were engaging in a suite of widely accepted but mathematically flawed methodologies known collectively as Questionable Research Practices (QRPs) 1718. The replication crisis was essentially the field waking up to the mathematical consequences of its own leniency.
The Danger of the "N-Heuristic" and Small Sample Sizes
A primary structural driver of the replication crisis was the field's heavy reliance on extremely small sample sizes 1415. The original power posing study relied on just 42 participants divided into two conditions 5. In behavioral science, small samples suffer from drastically low statistical power, making them highly susceptible to random noise and natural human variation 1214.
While researchers and the public often mistakenly assume that finding a significant result in a small sample means the effect must be incredibly strong, meta-scientific analyses reveal the exact opposite. Small samples tend to wildly exaggerate effect sizes due to sampling error 1421. If an effect is observed in a sample of 40 people by pure chance, the mathematics of the test will make the effect look massive. However, when that same study is run with 2,000 people, the natural variation smooths out, and the true - often negligible - effect emerges 15. This phenomenon highlights the danger of what methodologists call the "N-Heuristic," where researchers historically prioritized quick, low-cost studies over adequately powered investigations 15.
The Multiverse of P-Hacking
Perhaps the most destructive practice uncovered during the credibility revolution is "p-hacking" (also known as researcher degrees of freedom). This occurs when researchers consciously or unconsciously make flexible data analysis decisions until their data yields a statistically significant result, traditionally denoted by a p-value of less than .05 101716.
If a study doesn't immediately yield a significant finding, a researcher might exclude certain participants as "outliers," control for different demographic variables, look at a different dependent variable, or collect a few more data points and check again 1617. By exploring countless analytical pathways, researchers inadvertently capitalize on chance, guaranteeing that something will look significant eventually.
To demonstrate just how prevalent and impactful p-hacking can be, methodologists Marcus Credé and Leigh Phillips conducted a "multiverse analysis" of the original Carney, Cuddy, and Yap power posing data 1618. A multiverse analysis looks at every single plausible way a dataset could be analyzed. Credé and Phillips demonstrated that there were 54 different ways to analyze the power pose hormone data depending on how outliers were identified, how the dependent variable was specified (e.g., final hormone level vs. change in hormone level), and whether gender was controlled for 1618.
Depending on which specific combination of analytical choices a researcher made, the effect of power posing on testosterone ranged from a massive, highly significant effect to absolutely zero 16. The original authors had simply reported the one specific, optimistic pathway through the data that yielded a significant result, ignoring the vast "multiverse" of analytical pathways that showed no effect 1618.
| Feature | Original Studies (Pre-2015 Norms) | Rigorous Replications (Post-2015 Norms) |
|---|---|---|
| Sample Size (N) | Typically underpowered (e.g., N=42) | Massive, highly powered (e.g., N=200 to N=3,500+) |
| Data Transparency | Data held privately by researchers | Open datasets, shared materials, and open code |
| Analysis Plan | Flexible, determined after data collection (P-hacking) | Pre-registered publicly before data collection begins |
| Publication Bias | "File-drawer" effect; primarily positive results published | Registered Reports; published regardless of outcome |
| Result Replicability | Low (estimated ~36% success rate) | High (up to 86% success with rigorous methods) |
The File Drawer Problem and Flat P-Curves
Finally, the scientific literature was heavily distorted by severe publication bias, commonly referred to as the "file drawer problem" 17. Academic journals have historically favored publishing novel, surprising, and statistically significant results 17. Consequently, if a researcher ran an ego depletion study and found nothing, that study was relegated to a filing cabinet, never to be published or shared.
Roy Baumeister, the chief architect of ego depletion, readily admitted to this practice. In personal communications regarding his research, he stated that his laboratory ran multiple studies, acknowledging that "some of which did not work, and some of which worked better than others." He defended dropping the insignificant results by stating, "You may think that not reporting the less successful studies is wrong, but that is how the field works" 12. By hiding the failed experiments, the published literature created a powerful illusion of overwhelming, uniform evidence for a phenomenon that may have simply been the result of random statistical chance 312.
To quantify the scale of the file drawer problem, researchers Uri Simonsohn and Joe Simmons applied a statistical tool called a "p-curve analysis" to the power posing literature. A p-curve looks at the distribution of significant p-values across a body of literature to determine if the findings possess actual "evidential value" or are merely the result of selective reporting 1225.
If an effect is real, there should be vastly more studies with highly significant p-values (e.g., p < .01) than barely significant ones (e.g., p = .04). When Simonsohn and Simmons analyzed the 33 supportive studies frequently cited by power posing defenders, they found the p-curve was completely flat 2519. A flat p-curve indicates that the entire body of literature is statistically indistinguishable from a scenario where the true effect size is zero and the published results exist solely due to selective reporting and p-hacking 1219.
The Defenders and the Culture War
The revelation that massive swaths of textbook psychology might be false did not go over smoothly. Instead, a bitter debate erupted within the scientific community, taking on the characteristics of an academic culture war.
Researchers whose entire careers, TED talks, and book deals were built on phenomena like ego depletion and power posing reacted defensively. Baumeister and other proponents of the willpower muscle argued that the replication failures did not invalidate their theory. Instead, they argued that the replication teams lacked the "expertise" to properly execute the psychological manipulations, failing to perfectly recreate the delicate psychological conditions required to elicit the effect 127. They suggested that subtle differences in the tasks used, the instructions given, or the context of the laboratory had destroyed the effect 13.
Critics quickly pointed out the logical inconsistency in this defense: if a psychological effect is supposedly robust enough to dictate human behavior in chaotic, everyday life - such as deciding whether to buy a car or break a diet - it should not completely disappear simply because a laboratory used a slightly different computer task 27. Furthermore, methodologists noted a conceptual crisis: many of the tasks used in ego depletion research had never been independently validated as actual measures of self-control, making it impossible to derive unambiguous predictions 3.
Cuddy similarly defended power posing, arguing that critics were ignoring the evidence and focusing too heavily on physiological markers like hormones rather than the subjective, self-reported feelings of power 45. She also argued that holding a pose for three minutes, as some replications required, was too long and uncomfortable, somehow reversing the confidence-boosting effects seen at two minutes 28.
The discourse occasionally turned toxic. Princeton psychologist Susan Fiske famously published a scathing critique of the reform movement, referring to independent researchers and statisticians who pointed out anomalies as "methodological terrorists" and the "self-appointed data police," accusing them of bullying researchers and undermining the public's trust in science 6.
However, the reformers pressed on. Independent watchdogs, such as the organization Retraction Watch, began meticulously tracking scientific retractions and fraudulent data. Founded in 2010 when journal retractions were incredibly rare, Retraction Watch grew into a massive database that, by 2024, had cataloged over 50,000 retracted papers globally 2030. This data proved that the rot in scientific literature was not isolated to a few quirky social psychology studies. It extended into medicine, biology, and computer science, fueled by paper mills, forged data, and an academic culture that prioritized publication quantity over uncompromising truth 2030.
The Credibility Revolution: Rebuilding Science
Far from destroying the field, the replication crisis catalyzed what is now known as the "credibility revolution" 11. The painful realization that standard, unquestioned practices were producing a literature littered with false positives led to a systemic, community-driven overhaul of how behavioral science is conducted and evaluated.
The cornerstone of this revolution is the Open Science movement. Spearheaded by organizations like the Center for Open Science (COS), the movement advocates for total transparency across the entire research lifecycle 2122.
Pre-registration and Registered Reports
One of the most powerful methodological tools to emerge from this era is the practice of "pre-registration." Before a researcher collects a single data point, they must publicly log their exact hypothesis, intended sample size, and strict data analysis plan on platforms like the Open Science Framework (OSF) 1721. This permanently eliminates the ability to p-hack or selectively report data, as peer reviewers and the scientific community can compare the final published paper against the original, time-stamped registered plan to ensure no corners were cut 1733.
Journals have also introduced a revolutionary publication model known as "Registered Reports." Under this model, a journal reviews the introduction and methodology of a proposed study before it is actually conducted. If the methodology is sound and the question is important, the journal guarantees publication in advance, regardless of whether the final result is positive, negative, or completely null 1723. This elegant solution directly neutralizes the file-drawer problem, realigning incentives so that researchers are rewarded for asking good questions rigorously, rather than merely finding shiny, statistically significant anomalies.
Rigor Yields Replicability
There is strong empirical evidence that these new methods actually work. In a massive six-year study published in late 2023 in the journal Nature Human Behaviour, a coalition of top laboratories from institutions like UC Berkeley and Stanford attempted to discover and replicate 16 novel psychological findings 7. They did not use the old playbook. Instead, they used strict open science best practices, including massive sample sizes and rigorous pre-registration 7.
The result was an astonishing 86 percent replicability rate 727. The authors concluded that this high rate was the absolute maximum achievable given standard effect sizes, proving that when researchers abandon questionable shortcuts and adhere to rigorous methodological standards, psychological science can be highly reliable 7.
A Global Shift Toward Open Science
The shockwaves of the replication crisis have extended far beyond Western psychology departments, sparking a global policy shift toward transparent research infrastructure 2425. In 2021, UNESCO published a landmark recommendation endorsing open science, which all member states accepted, recognizing that the democratization of scientific knowledge is critical for accelerating innovation and solving global crises 262728.
However, the transition to open science has occurred unevenly around the world, hindered by stark disparities in internet connectivity, funding, and institutional support 242627. While the United States and Western Europe still account for roughly 85 percent of open publication and data repositories, other regions are rapidly pioneering their own unique models to circumvent the paywalls of traditional commercial publishers 2729.
Latin America's Diamond Open Access Leadership
Latin America has long been a global pioneer in open science, establishing robust, non-commercial infrastructures decades before the replication crisis forced the issue in the Global North 2930. Initiatives like the Scientific Electronic Library Online (SciELO), created in Brazil in 1996, and Redalyc in Mexico (2003), provide vast, interconnected digital libraries of open-access journals 3042.
These platforms rely overwhelmingly on the "Diamond Open Access" model. Unlike standard open access models in the US and Europe where researchers or their grants must pay exorbitant Article Processing Charges (APCs) to publish their work - a heavy, often exclusionary burden for scientists in developing nations - Diamond Open Access is completely free for both the reader and the author 273132. Analyses of open science publishing reveal that nearly 90 percent of Latin American journals indexed in SciELO utilize this highly equitable Diamond model 31. Subsidized entirely by academic institutions, government funding, and university presses, Latin America has successfully insulated a massive portion of its scientific output from the commercial logic and profit motives of Western publishing oligopolies 2931.
| Open Access Model | Who Pays to Read? | Who Pays to Publish? | Regional Dominance |
|---|---|---|---|
| Traditional Subscription | Reader / Institution | Free for Author | Global North (Historically) |
| Gold Open Access | Free for Reader | Author pays APCs | Global North (Increasingly) |
| Diamond Open Access | Free for Reader | Free for Author (Institution-funded) | Latin America |
Grassroots Networks in Africa
In Africa, where scientific funding and infrastructure face significant constraints, the push for open science and replicability is being driven by powerful grassroots community organizing 3346. Organizations like the African Reproducibility Network (AREN), officially established in 2022, are actively working to bridge the gap in open science advocacy 4634.
AREN operates a comprehensive, tiered training program to develop Local Network Leads (LNLs) across the continent 4635. Rather than relying on top-down mandates, the program trains grassroots researchers in practical open science tools, such as how to properly pre-register studies, share data transparently, and conduct rigorous power analyses 3536. By the end of 2024, the program successfully trained 28 researchers representing 15 different African countries 35. These highly trained champions return to their home institutions to establish local communities of practice, teaching their peers how to navigate the shifting requirements of global research standards 3537. By fostering local expertise and acknowledging regional challenges, these networks are building sustainable, culturally relevant open science ecosystems that elevate the quality of global research 3337.
Bottom line
The spectacular collapse of blockbuster theories like ego depletion and power posing served as a painful but profoundly necessary reckoning for the behavioral sciences. By exposing the invisible dangers of small sample sizes, selective reporting, and p-hacking, the replication crisis forced the academic community to abandon a culture that rewarded flashy, fragile findings in favor of strict methodological rigor. Today, propelled by the rise of pre-registration, Registered Reports, and equitable global publishing models, the scientific enterprise is steadily rebuilding its foundation to ensure that the discoveries of tomorrow are built on robust, reproducible facts rather than statistical illusions.