How information spreads through a social network: the math of contagion models

Key takeaways

  • Simple contagions like memes and fake news require only one exposure and spread rapidly across loose, weak ties.
  • Complex contagions like behavioral changes require multiple points of social reinforcement and rely on dense network clusters.
  • False news spreads significantly faster and deeper than true news because it triggers high-arousal emotions and psychological novelty.
  • Modern platform algorithms optimize for user attention and watch time, creating closed feedback loops that shape human behavior.
  • Predicting exact cultural success before release remains a mathematical impossibility due to random early advantages and social influence.
  • Relentless algorithmic optimization has triggered severe user fatigue, driving people back toward human-curated and un-optimized feeds.
Digital information spreads through social networks using precise mathematical patterns that closely mimic biological viral outbreaks. While simple ideas and fake news spread rapidly across casual connections, adopting new behaviors requires dense social reinforcement. Modern platforms utilize complex recommendation algorithms to maximize attention by exploiting our psychological biases for novelty. However, despite this aggressive mathematical optimization, the exact spread of individual content remains fundamentally unpredictable and has sparked a growing wave of user algorithm fatigue.

How Math Explains the Spread of Information

Information cascades through social networks not by magic, but through mathematical patterns that closely resemble viral disease outbreaks. While simple facts and rumors spread rapidly across loose, casual connections, adopting new behaviors requires "complex contagions" that rely on dense, overlapping social reinforcement. Ultimately, modern recommendation algorithms leverage these mathematical principles to exploit our psychological biases for novelty, creating digital environments where the macroscopic spread of information is engineered, even if the success of an individual post remains fundamentally unpredictable.

The Biological Roots of Digital Contagions

When we colloquially state that a video or an idea "went viral," we are utilizing a biological analogy that is mathematically and structurally precise. In the physical sciences, a biological virus is defined as an infective agent - typically a nucleic acid molecule enclosed in a protein coat - that is entirely dependent on a living host cell to replicate 12. Viruses like Ebola or smallpox contain remarkably simple genetic code, yet their replication mechanism is devastatingly efficient: they latch onto a host, penetrate its defenses, and hijack the cell's internal machinery to manufacture copies of themselves 1.

A digital phenomenon operates on the exact same fundamental logic, merely substituting biological code for ideological code. Coined by evolutionary biologist Richard Dawkins in his 1976 book The Selfish Gene, a "meme" is a unit of cultural transmission that relies on human brains as its host cells 12. Just as a biological virus uses a host's enzymes to replicate its RNA or DNA, a digital meme utilizes a human's social capital, cognitive attention, and digital hardware to replicate its message across the internet. Both entities are replicators subject to Darwinian evolutionary principles 1.

RNA, Ambigrams, and Network Virology

Interestingly, mathematical insights into the geometry of biological viruses have increasingly influenced how we map and understand digital networks 12. For instance, researchers studying infectious diseases have modeled how viral RNA helps pull together protein shells (capsids) to protect genetic material during transmission 2. Some real-world RNA viruses, like the "narnavirus," act like ambigrams - their genetic material can be read both forward and backward by host cells, a protective mechanism evolved to elude host immune defenses 1.

Digital information behaves in much the same way. A meme or a piece of viral misinformation can mutate, be remixed, or be interpreted ironically (effectively read backward) by a new demographic while retaining its core replicative power. However, while the biological analogy is incredibly useful, it has distinct limits. A physical virus typically requires only proximity or physical contact to spread. A digital idea requires attention, belief, and occasionally a fundamental change in human behavior 34. To model this cognitive requirement, network scientists rely on the mathematics of epidemiology.

The Classic Epidemic Models: SIR and Beyond

To understand how a tweet, a viral dance trend, or a conspiracy theory moves through a population, mathematicians borrow heavily from epidemiological frameworks. The foundational mathematical model for this is the SIR model, originally developed by W.O. Kermack and A.G. McKendrick in 1927 56.

In the classic SIR model, a closed population is divided into three distinct compartments: * Susceptible (S): Individuals who have not yet encountered the information but are vulnerable to it. * Infected (I): Individuals who have received the information and are actively spreading it to others. * Removed/Recovered (R): Individuals who have either lost interest, forgotten the information, or become immune to it (meaning they will no longer share it) 56.

The movement of people between these states is governed by differential equations. The rate at which susceptible people become infected ($dS/dt$) depends on the contact rate between the Susceptible and Infected groups, multiplied by the probability of transmission, often denoted as $\beta$ 7. Simultaneously, the rate at which infected people are removed from the active spreading pool is governed by a recovery rate, denoted as $\gamma$ 7.

From these variables, epidemiologists derive the basic reproduction number ($R_0$), which represents the average number of secondary cases generated by a single infected node in a fully susceptible population 8. Mathematically, if $R_0 > 1$, the information experiences exponential growth and goes viral; if $R_0 < 1$, the cascade will eventually die out.

Variations on the Mathematical Theme

While the standard SIR model is elegant, human social networks and cognitive attention spans are highly dynamic. Traditional models often assume homogeneous mixing - the unrealistic idea that everyone in a population has an equal chance of contacting everyone else 5. To better capture the realities of digital life, network scientists have developed several extensions:

Epidemic Model Compartment Flow Digital Network Application
SIR Susceptible $\to$ Infected $\to$ Recovered Standard viral trends or breaking news where a user shares an item once and then moves on, becoming "immune" to sharing it again 459.
SIS Susceptible $\to$ Infected $\to$ Susceptible Recurring memes or cyclical seasonal trends where users can "catch" the idea, share it, forget it, and become vulnerable to sharing it again later 810.
SEIR Susceptible $\to$ Exposed $\to$ Infected $\to$ Recovered Information that requires an incubation period. A user is "exposed" (reads a long article) but takes time to process the information before transitioning to "infected" (sharing it) 810.
SVFR Susceptible $\to$ View $\to$ Forward $\to$ Removed A specialized model developed specifically for messaging apps like WeChat, separating the passive act of viewing from the active act of forwarding 91314.

In recent years, researchers applying the SVFR model to large-scale WeChat datasets found that it perfectly captures the tree-size distribution and degree variance of information cascades 914. The SVFR model mathematically demonstrates that a user with a massive number of connections (a highly connected hub) may actually have a limited influence on spreading a specific piece of content, simply because human attention is finite and must be divided among thousands of incoming signals 9.

Mapping the Social Network: Independent Cascades vs. Linear Thresholds

Biological epidemiology equations are excellent for tracking macroscopic trends, but social networks add layers of individual psychological complexity. When a person decides whether or not to hit "retweet," they are not acting like a passive cell; they are making a decision. To account for human psychology, computer scientists rely on two primary algorithmic models to simulate micro-level information cascades: the Independent Cascade (IC) Model and the Linear Threshold (LT) Model 1112131419.

The Independent Cascade (IC) Model

In the IC model, the spread of information is fundamentally sender-centric. When a user becomes "infected" (i.e., they adopt a piece of information or technology), they are given a single, independent opportunity to infect each of their direct neighbors in the network 1113.

The success of this transmission is based on a predetermined probability weight ($p_{u,v}$) assigned to the edge connecting node $u$ to node $v$ 111319. If the sender succeeds, the neighbor adopts the information. If the attempt fails, the sender never gets another chance to infect that specific neighbor with that specific information in future time steps 13. This model perfectly mirrors the passive, ephemeral scroll of social media: you see a post from a friend in your feed, and you either share it immediately or scroll past it, never to interact with that specific instance again.

The Linear Threshold (LT) Model

Conversely, the Linear Threshold (LT) model is receiver-centric. In this framework, every node in the network is assigned a personal "threshold" ($\theta$) chosen uniformly at random between 0 and 1 111213. This threshold represents how resistant a specific individual is to adopting a new idea or behavior.

A user will only adopt the information if the combined, normalized weight of their already-infected neighbors exceeds their personal threshold 11. This means that a single exposure from one friend is rarely enough to trigger activation. The user must be surrounded by the idea, receiving cumulative social pressure from multiple directions until their psychological resistance breaks 1315.

The Challenge of Influence Maximization

Both the IC and LT models are frequently used by data scientists to solve the "Influence Maximization" problem: if a marketing firm or public health organization can only afford to "seed" a limited number of initial users with a message, which users should they choose to guarantee the largest possible cascade? 111319.

Finding the mathematically optimal set of initial adopters is computationally hard (NP-hard). However, researchers like Kempe et al. proved that for both the stochastic linear threshold and independent cascade models, the expected spread of influence functions as a "normalized, monotone, submodular function" 13. Because of this submodularity (which essentially means diminishing returns), data scientists can use simple "greedy algorithms" to select seed nodes one by one, mathematically guaranteeing a cascade size that is within a known factor of the absolute optimal choice 13.

Simple vs. Complex Contagions: Why Behaviors Stall

For decades, the standard assumption in sociology, marketing, and computer science was that all contagions behaved roughly the same way. Whether it was a rumor, a computer virus, or a new fashion trend, the math of simple diffusion models was applied universally 16. But this assumption was fundamentally flawed. An idea is not a disease, and adopting a new behavior requires an entirely different network topology.

This breakthrough distinction was pioneered by Damon Centola, a professor at the University of Pennsylvania, who introduced the theory of Simple vs. Complex Contagions 3172318.

A simple contagion requires only a single exposure to spread 317. Infectious diseases are classic simple contagions. If a stranger sneezes on you on the subway, you can catch a virus; you do not need to know them, and you do not need three other people to sneeze on you to confirm that catching the virus is a good idea 18. In the digital realm, breaking news, funny memes, and job openings act as simple contagions. If an acquaintance shares a link to an amusing video, one exposure is usually enough to prompt you to watch and potentially share it 1725.

A complex contagion, however, requires multiple sources of social reinforcement before an individual adopts the change 31625. Complex contagions involve friction and risk. Examples include adopting a new preventative health behavior, migrating to an unproven social media platform, or participating in a high-risk political protest 31719.

Centola and his colleague Michael Macy mathematically identified four mechanisms that explain why complex contagions require multiple exposures: 1. Strategic Complementarity (Coordination): The value of the adoption depends heavily on others. A communication platform like WhatsApp is useless if you are the only one using it; it requires a critical mass of your social circle to have value 3. 2. Credibility: Innovations often lack legitimacy until adopted by multiple neighbors. Hearing a wild rumor from one isolated person might be dismissed as fiction; hearing the same story from three independent friends grants it credibility 316. 3. Legitimacy: The social risk of deviating from established norms is high. Multiple adopters validate the safety and social acceptability of the new behavior 3. 4. Emotional Contagion: Complex actions, such as joining a social movement, often require deep emotional mobilization, which compounds with repeated exposure from trusted peers 16.

The Illusion of the Small World

This paradigm fundamentally upended one of the most famous theories in 20th-century sociology: Mark Granovetter's 1973 theory of the "strength of weak ties" 1920. Granovetter famously argued that our close friends (strong ties) all tend to know each other, creating redundant, clustered information loops. Casual acquaintances (weak ties), however, bridge the gap between entirely different social clusters, acting as high-speed highways for new information to traverse vast social distances 1820.

Network scientists Duncan Watts and Steven Strogatz later formalized this into the "Small World" model. They demonstrated mathematically that if you take a highly clustered, lattice-like network and randomly rewire just a few ties to distant nodes, you dramatically shrink the degrees of separation across the network, exponentially accelerating the diffusion of simple contagions 31620.

But Centola's research revealed a massive blind spot: for complex contagions, Granovetter's prized weak ties are actually a structural weakness 19. Because a weak tie is, by definition, a single link bridging two distant communities, it acts as a narrow bridge 2520. If a new, risky behavior needs to cross from Community A to Community B, one casual acquaintance mentioning it will not provide the threshold of social reinforcement required by the Linear Threshold model to trigger adoption. The contagion hits the narrow bridge and stalls completely 1619.

Wide Bridges and Social Reinforcement

Instead of weak ties, complex contagions require wide bridges. A wide bridge exists when multiple, redundant ties connect two otherwise distinct social clusters 1825. If you see three different colleagues from an entirely distinct department using a new software tool, the bridge is wide enough to provide the redundancy and confirmation necessary for you to adopt it.

To prove this mathematically, Centola designed a massive online experiment involving roughly 1,500 participants assigned to artificially structured online health communities 21. He tested random networks (with many long, weak ties) against highly clustered lattice networks. Defying 50 years of sociological conventional wisdom, he found that the rate of adopting a specific behavioral change was four times greater when individuals were organized in dense, clustered networks rather than random ones 2122. This reveals why major social movements and high-risk innovations tend to diffuse spatially and incrementally - spreading slowly from neighborhood to neighborhood like a fishing net - rather than exploding universally like fireworks 162320.

Research chart 1

The Mathematics of Fake News

If behavioral changes spread so slowly through dense clusters, how do we explain the explosive, lightning-fast spread of political misinformation and fake news on public platforms? The answer is that, unlike behaviors, fake news acts as a super-charged simple contagion.

In 2018, researchers Soroush Vosoughi, Deb Roy, and Sinan Aral published a landmark, peer-reviewed study on the cover of Science magazine that mathematically mapped the differential diffusion of truth and falsehood 232425. Having analyzed a massive dataset of roughly 126,000 verified rumor cascades spreading on Twitter between 2006 and 2017 - which were tweeted by roughly 3 million people over 4.5 million times - their findings were staggering 2426.

Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in every single category of information analyzed 2324.

Spreading Metric True News Performance False News Performance
Maximum Cascade Depth (Number of retweet hops from origin) Rarely diffused past a depth of 10 2326. Reached a depth of 19 nearly 10 times faster than truth reached 10 2326.
Overall Audience Reach Rarely reached more than 1,000 people 2325. The top 1% of cascades routinely reached between 1,000 and 100,000 people 2325.
Velocity of Spread Took roughly 6 times longer to reach 1,500 people 2326. Reached 1,500 people in a fraction of the time 2326.
Probability of Transmission Baseline probability of sharing. Users were ~70% more likely to retweet false news than true news 2735.

Why Falsehood Achieves a Higher $R_0$

The conventional wisdom at the time blamed this massive asymmetry on automated bot networks. However, Vosoughi's team utilized state-of-the-art bot-detection algorithms to systematically remove automated accounts from the data. They found that the variance remained entirely intact: bots accelerated the spread of true and false news at exactly the same rate 2324. The culprit driving the massive reproduction number ($R_0$) of fake news was human psychology.

The researchers hypothesized that the primary mathematical parameter driving this was novelty 23273536. Humans are evolutionary wired to pay attention to novel information, and we gain social status by being the first to share surprising "inside information" with our peers 3536. Falsehood is fundamentally unconstrained by reality, meaning it can be perfectly optimized and engineered for maximum novelty.

By utilizing natural language processing to analyze the sentiment of the replies to these tweets, the researchers found that false stories reliably inspired high-arousal emotions like fear, disgust, and surprise 232427. True stories, inevitably bound by the mundane realities and nuanced constraints of the physical world, inspired lower-arousal emotions like anticipation, sadness, joy, and trust 23. In the language of the Independent Cascade (IC) model, the "payoff" or transmission probability assigned to an edge between two users is heavily influenced by emotional arousal. Because falsehood maximizes this payoff, its velocity outpaces reality. (Notably, a later 2021 study by Ugander and Juul replicated this data, confirming the core finding that fake news reaches more people while spreading deeper and faster 35).

Closed Networks: The WhatsApp Effect

While public platforms like Twitter function like vast open spaces where simple contagions spread rapidly via weak ties, closed messaging platforms like WhatsApp function quite differently. Because they map onto real-world phone numbers and personal contacts, they are digital recreations of high-clustering networks - making them perfect incubators for complex contagions.

Recent cross-cultural research into closed network ecosystems across India, Indonesia, Colombia, and among U.S. Latinos has revealed a troubling phenomenon. Misinformation on these platforms is not just rapidly shared; it is socially reinforced within trusted, localized clusters, transforming it from a passing rumor into a solidified belief 37382829.

For example, a 2024 study by the Digital Democracy Institute of the Americas (DDIA) tracked over 1,400 public, Latino-led WhatsApp groups in the U.S., reaching roughly 3.4 million Spanish- and Portuguese-speaking users 29. They found that election disinformation and falsehoods regarding global conflicts spread via highly emotional language and ad-hominem attacks 29.

Dense Clusters and Information Velocity

The mathematical threat of closed networks is the presence of homophily - the tendency of individuals to associate with others who hold similar views and backgrounds. In a WhatsApp group of extended family members or local villagers, the network density is incredibly high. If a false political narrative enters the group, it does not just hit a user once. It is echoed by a cousin, an uncle, and a trusted community leader 3738.

This satisfies the threshold requirement of the Linear Threshold model for a complex contagion. Once a falsehood is legitimized by dense social proof, it transitions from a simple piece of fake news into a deeply entrenched, identity-based behavior. This mathematical dynamic explains why fact-checking debunk articles - which operate as simple information contagions - rarely succeed in overriding misinformation inside closed networks 373828.

Furthermore, researchers analyzing large-scale WeChat datasets have proven a "distance decay effect" in closed networks. The probability of information dissemination drops significantly as physical geographic distance increases between users, yet the velocity of the spread remains remarkably independent of geographic distance 30. This suggests that local spatial clusters become rapid echo chambers, intensely solidifying narratives within a specific physical geography before wide bridges eventually carry the contagion to the next town.

The Recommendation Era: TikTok's Infinite Scroll Math

If Granovetter and Centola defined the era of the "Social Graph" - where information spread depended on the specific topology of who you were friends with - TikTok ushered in the era of the "Interest Graph."

On highly algorithmic platforms, the mathematical model shifts away from a traditional network propagation model toward a recommender system 31. There is no traditional social network required for the math to work; the ability for users to follow or connect with others is secondary or even irrelevant to what surfaces on their feed 31. Instead, the platform relies on vast matrices of user behavior.

TikTok's "For You" algorithm acts as an omniscient matchmaker, utilizing probability theory, statistics, and linear algebra to optimize for human attention 3233. At its core, the algorithm maps users and videos as vectors in a latent space using matrix factorization 33. When you linger on a video, the system updates the matrix in real-time, calculating the mathematical distance between your preferences and millions of other users to find content that "people like you" have watched 32. It even utilizes Fourier transforms to perform trend analysis on the actual sound waves of the audio tracks, predicting the success of specific music beats and speech patterns 33.

The Algorithmic Scoring Equation

Internal documents leaked from the company, notably one titled "TikTok Algo 101," revealed the rough mathematical equation driving video scores:

$Score = (P_{like} \times V_{like}) + (P_{comment} \times V_{comment}) + (E_{playtime} \times V_{playtime}) + (P_{play} \times V_{play})$ 34

In this formula, $P$ represents the algorithm's calculated probability that a user will take a specific action, $E$ represents the expected duration of that action, and $V$ represents the value or weight the platform's engineers assign to that action 34.

If the algorithm predicts a 50% chance you will leave a comment (an action assigned high value) or a 90% chance you will watch a 10-second video three times in a row (high expected playtime), the video's aggregate score skyrockets, and it is dynamically pushed into your feed 34. This is a classic optimization problem designed strictly to maximize total watch time 32.

The algorithm tests users continuously. During a user's first 30 videos (the "cold start" phase), the system deliberately injects a highly curated mix of trending topics, emotional triggers, niche subcultures, and engagement bait to rapidly map their psychological pressure points 46. Researchers at the University of Washington who audited 9.2 million video recommendations found that within a user's first 1,000 videos, TikTok actively exploits a user's identified interests 30% to 50% of the time, while using the rest of the feed to explore new niches 35. This mathematical exploitation is devastatingly effective; the researchers found that over users' first 120 days, average daily time on the platform nearly doubled from 29 minutes to 50 minutes 35.

Because the algorithm ruthlessly optimizes for expected playtime, content creators respond by structuring videos to exploit the math. They use overarching text like "wait for it until the end" to artificially inflate expected watch time, or generate polarizing "rage-bait" to force high-value comments 34. The system becomes a closed feedback loop: human psychological biases train the algorithm, and the algorithm, in turn, shapes human behavior.

The Limits of Algorithmic Prediction

Given these massive behavioral datasets and sophisticated linear algebra equations, can data scientists mathematically predict exactly what will go viral before it is released? The short answer is no.

In a seminal 2006 computational sociology experiment known as the "Music Lab," researchers Matthew Salganik, Peter Sheridan Dodds, and Duncan Watts set out to test the predictability of cultural markets 363738. They created an artificial music downloading platform featuring 48 unknown songs by obscure bands and recruited over 14,000 participants 3738.

Participants were split into two main groups. One group operated in an "independent" world, where they only saw the song names and made choices blindly. The second group was split into eight parallel "social influence" worlds, where they could see exactly how many times other people in their specific world had already downloaded a song 3738.

Inherent Unpredictability and Inequality

The results exposed a fundamental, mathematical limit to prediction in social systems. In the social influence worlds, cumulative advantage took over. Early, entirely random initial advantages snowballed through information cascades 3637. A song that became a massive hit in World 1 might end up totally ignored in World 4 38.

The researchers measured this using the Gini coefficient (a standard metric for inequality) and found that increasing the strength of the social signal vastly increased the inequality of success across the board 3638. While the very "best" songs (as rated by the independent world) rarely failed completely and the very "worst" rarely became mega-hits, the vast majority of the content sitting in the middle experienced inherently unpredictable outcomes 38. Because these vastly different outcomes occurred among statistically indistinguishable groups of participants evaluating the exact same songs, the unpredictability is considered an inherent mathematical property of the network, not a lack of data 3839.

Network scientists have refined this slightly in recent years, proving that you can partially predict the future virality of a meme by quantifying its early spreading pattern. If a meme heavily permeates across multiple distinct community boundaries very early in its lifecycle, it has a much higher probability of going viral globally 524041. However, predicting cultural success before an item is released into the network remains a mathematical impossibility.

The Rise of Algorithm Fatigue

By 2025 and 2026, the relentless mathematical optimization of recommender systems led to an inevitable psychological backlash: algorithm fatigue 425657.

When algorithms become too efficient at delivering perfectly targeted, hyper-optimized content, users begin to feel trapped. In psychological frameworks like the Stimulus-Organism-Response model, this manifests as "social media overload" - where the continuous exposure to an endless, customized feed creates severe mental and emotional exhaustion 4243.

Consumers report craving uniqueness but finding themselves in a "sea of sameness" 44. The feedback loops of matrix factorization result in aesthetic homogenization; creators must adhere to invisible mathematical rules to survive, resulting in identical video transitions, trending audios, and lighting formats across all platforms 6045. Algorithm fatigue leads directly to active resistance behaviors, with consumers actively muting their environments, clearing their cookies to reset the system, or simply abandoning platforms altogether because they recognize they are not being curated for, but processed 436062.

Escaping the Sea of Sameness

This fatigue has sparked a noticeable anti-algorithm counter-trend in 2026. Users are intentionally seeking out high-friction, un-optimized chronological feeds, and retreating to closed, human-curated communities 6063. Digital culture has shifted away from algorithmically forced "micro-trends" (which burn out rapidly) toward overarching "vibes" - broader, slower-moving cultural shifts and lifestyles that are resistant to mathematical tracking 45.

Furthermore, as generative AI collapses the marginal cost of content production to near zero, flooding feeds with synthetic images and text, the economics of attention are fundamentally destabilized 63. In an era of infinite content, the scarcity is no longer the content itself, but trusted human curation, context, and intentional attention 63.

Bottom line

Information spreads through a social network based on mathematically definable patterns, but the required network topology depends entirely on what is spreading. Simple contagions - like viral videos and emotionally arousing fake news - exploit long, weak ties to rapidly saturate a network, while complex contagions - like behavioral changes - encounter friction and demand the dense, overlapping reinforcement of wide social bridges. While modern recommendation algorithms have weaponized linear algebra to monopolize human attention and engineer engagement, the inherent unpredictability of social cascades - combined with a rising wave of algorithm fatigue - ensures that cultural outcomes can never be perfectly controlled by a formula.

About this research

This article was produced using AI-assisted research using mmresearch.app and reviewed by human. (CalmDeer_36)