Monte Carlo Simulation

EPH 752 Advanced Research Methods

Francisco Cardozo, PhD

Can a computer solve probability?

The Big Question

What if, instead of deriving complex mathematical formulas, we could just…

  • Try it thousands of times
  • Count how often something happens
  • Estimate the probability from the results

This is the fundamental idea behind Monte Carlo simulation.

The Core Insight

If you flip a coin 10 times, you might get 6 heads. That’s not 50/50.

But if you flip it 1,000 times, the ratio will settle close to 50/50.

The more you try, the closer you get to the truth.

The Simplest Case: A Coin Flip

When you flip a coin 10 times, you might get 6 heads and 4 tails — obviously not 50/50.

But if you keep flipping:

  • At first the ratio jumps all over the place
  • After many flips, it slowly settles down
  • After 100 flips: maybe 51 heads, 49 tails

The Law of Large Numbers

As the number of random trials increases, the average result will converge to the expected value.

First proven by Jacob Bernoulli in 1713, a foundational idea in probability for over 200 years.

But Bernoulli only proved this works for independent events. What about the real world, where almost everything depends on something else?

Let’s Try It with a Chess Set

What is the probability of picking the Queen?

Mathematical Solution

\[P(\text{Queen}) = \frac{\text{Number of Queens}}{\text{Total Chess Pieces}} = \frac{1}{16} = 0.0625\]

This is easy, we can solve it with a formula. But what if the problem were harder?

Simulation Approach

Instead of calculating, we just:

  1. Randomly pick a piece from the bag
  2. Write down what we got
  3. Put it back and repeat, many times
  4. Count how often we got the Queen

As samples increase, our estimate approaches 1/16 = 0.0625.

Let’s Try: 10 Samples

First, how many samples do we want to try?

Now we can simulate the process:

Results with 10 Samples

Did we get close to 0.0625? Probably not, 10 samples isn’t enough.

\[P(\text{Queen}) = \frac{1}{16} = 0.0625\]

Let’s Try: 100 Samples

Getting closer?

\[P(\text{Queen}) = \frac{1}{16} = 0.0625\]

Let’s Try: 500 Samples

\[P(\text{Queen}) = \frac{1}{16} = 0.0625\]

Watching Convergence

More samples → closer to the true probability. That’s the Law of Large Numbers in action.

As sample size increases, Monte Carlo estimates converge to theoretical probabilities (Law of Large Numbers). The computer does the hard work for us.

Now Make It Harder: Two Bishops

Mathematical Solution

\[P(\text{Both Bishops}) = \frac{\binom{2}{2}}{\binom{16}{2}} = \frac{1}{120} \approx 0.0083\]

Where:

\(\binom{2}{2}\) = Ways to choose 2 bishops from 2 bishops = 1

\(\binom{16}{2}\) = Ways to choose 2 pieces from 16 pieces = 120

Simulation Approach

We repeatedly sample 2 pieces from the chess set and count how often we get both bishops.

No formula needed, just sampling and counting.

Even Harder: KING Between Two PAWNs

\[P(\text{PAWN, KING, PAWN}) = \frac{\binom{8}{2} \times \binom{1}{1}}{\binom{16}{3}} \times \frac{1}{3!} = \frac{28 \times 1}{560} \times \frac{1}{6} \approx 0.0083\]

Where:

  • \(\binom{8}{2}\) = Ways to choose 2 pawns from 8 pawns = 28
  • \(\binom{1}{1}\) = Ways to choose 1 king from 1 king = 1
  • \(\binom{16}{3}\) = Ways to choose 3 pieces from 16 pieces = 560
  • \(\frac{1}{3!}\) = Probability of specific order (PAWN, KING, PAWN)

We repeatedly sample 3 pieces and check if they match the pattern:

  1. First piece is PAWN
  2. Second piece is KING
  3. Third piece is PAWN

No need to derive the formula, just simulate and count!

Simulating the KING Pattern

Results: KING Between Two PAWNs

The Pattern So Far

With Formulas:

  • Exact answers
  • Require complete mathematical specification
  • Get very complex very fast
  • Sometimes impossible for realistic problems

With Monte Carlo:

  • Approximate answers
  • Just simulate and count
  • Handle any level of complexity
  • Accuracy increases with more samples

Important

As sample size increases, Monte Carlo estimates converge to theoretical probabilities (Law of Large Numbers). The computer does the hard work for us.

A Math Dispute That Changed Everything

Russia, 1905

Pavel Nekrasov (1853–1924) Andrey Markov (1856–1922)

The Two Sides

Pavel Nekrasov, The Tsar of Probability

  • Religious, powerful, defended the status quo
  • Argued math could explain free will

Andrey Markov, Andrey the Furious

  • Atheist, rigorous, had no patience for sloppy reasoning
  • Called Nekrasov’s work “an abuse of mathematics”

The Argument: Independence and Free Will

Their dispute centered on the Law of Large Numbers, the same idea we just used with chess pieces.

Nekrasov’s reasoning: 1. Bernoulli proved the Law of Large Numbers works for independent events
2. Social statistics (marriage rates, crime rates) follow the Law of Large Numbers
3. Therefore, the decisions behind them must be independent 4. Independent decisions = free will

“Free will is not just philosophical, it’s measurable!”

Markov’s response:

Just because we see convergence doesn’t mean the events are independent.

He set out to prove that dependent events could also follow the Law of Large Numbers.

He needed an example where one event clearly depends on what came before…

Markov’s Counterattack: Poetry as Data

Markov turned to Eugene Onegin by Alexander Pushkin, a masterpiece of Russian literature.

He took the first 20,000 letters, stripped out punctuation and spaces, and analyzed them:

  • 43% vowels, 57% consonants
  • If letters were independent, vowel-vowel pairs should appear 18% of the time (0.43 × 0.43)
  • But he found vowel-vowel pairs only 6% of the time

The letters were clearly dependent, what comes next depends on what came before.

The Test

If Markov could show that these dependent letters still followed the Law of Large Numbers, he would shatter Nekrasov’s argument.

And that’s exactly what he did.

The First Markov Chain

Markov built a prediction machine with just two states:

  1. Start at a vowel or consonant
  2. Look up the transition probability to the next state
  3. Generate a random number to decide what happens
  4. Repeat

After many steps, the ratio converged to 43% vowels and 57% consonants, exactly what he counted by hand.

Dependent events following the Law of Large Numbers.

Markov Chains

A sequence of events where the probability of each event depends only on the current state.

Try it: Markov Chain Visualization

“Free Will Is Not Necessary to Do Probability”

In fact, independence isn’t even necessary. With Markov chains, he found a way to do probability with dependent events.

This should have been a huge breakthrough, because in the real world, almost everything depends on something else:

  • Weather tomorrow depends on conditions today
  • How a disease spreads depends on who is infected now
  • The behavior of particles depends on particles around them

But Markov himself didn’t seem to care much about applications.

“I am concerned only with questions of pure analysis. I refer to the question of applicability with indifference.”

Little did he know what would come next.

From Solitaire to the Atomic Bomb

Enrico Fermi’s Innovation (1930s)

  • Pioneered “statistical sampling” to model neutron diffusion
  • Used simple probabilistic methods before modern computers
  • Created the foundation for particle physics simulations

Ulam’s Solitaire Moment (1946)

In January 1946, mathematician Stanislaw Ulam was struck by severe encephalitis. During his long recovery, he played countless games of Solitaire.

One question kept nagging him: What are the chances that a randomly-shuffled game can be won?

With 52 cards, there are 52! possible arrangements, about 8 × 1067. Solving this analytically was hopeless.

Then Ulam had a flash of insight: What if I just play hundreds of games and count how many can be won?

The Monte Carlo Insight

This is exactly what we did with chess pieces!

Instead of deriving a formula, just:

  1. Generate random scenarios
  2. Check the outcome
  3. Repeat many times
  4. Count the results

The key difference: now he wanted to apply this to nuclear physics.

The Manhattan Project Connection

Back at Los Alamos, scientists were trying to figure out how neutrons behave inside a nuclear core.

A neutron can:

  • Scatter off an atom and keep traveling
  • Get absorbed or leave the system
  • Cause fission, splitting a uranium atom and releasing two or three more neutrons

Each step depends on the previous one, it’s a Markov chain.

Ulam shared his Solitaire idea with John von Neumann, who immediately saw its power: simulate the chain on a computer.

Fission of a uranium-235 nucleus: a neutron is captured, forming unstable U-236, which splits into two fragments releasing three more neutrons.

The Birth of Monte Carlo

Von Neumann and Ulam ran their neutron simulations on the ENIAC, one of the world’s first electronic computers.

They tracked the multiplication factor k:

  • k < 1: reaction dies down
  • k = 1: self-sustaining chain reaction
  • k > 1: exponential growth — a bomb

The method needed a name. Ulam’s uncle was a gambler, and the random sampling reminded him of the Monte Carlo Casino in Monaco.

The Monte Carlo method was born.

The MANIAC Computer (1953)

Mathematical Analyzer, Numerical Integrator, And Computer

  • 70,000 multiplications per second (modern computers: billions)
  • Arianna Rosenbluth’s contribution often overlooked

Key Milestones After the Bomb

1955, RAND Corporation publishes A Million Random Digits, the first resource for random number generation

1970, W. Keith Hastings generalizes the Metropolis algorithm, enabling Bayesian applications

“A lot of statisticians were not oriented toward computing. They take these theoretical courses, crank out theoretical papers, and some of them want an exact answer.*

1991, Johnson & Gastwirth use Bayesian Monte Carlo to show why universal HIV screening would be counterproductive, even highly sensitive tests generate many false positives for rare diseases

Today, Hamiltonian Monte Carlo powers modern Bayesian software like Stan and PyMC, making complex statistical inference accessible to researchers everywhere

What Is Monte Carlo Simulation?

Putting It All Together

Monte Carlo simulation is a computational method that:

  • Uses repeated random sampling to obtain numerical results
  • Approximates solutions to problems too complex for formulas
  • Generates countless “what-if” scenarios based on probability distributions
  • Works by leveraging the Law of Large Numbers, more trials → better estimates

The Three Steps

  1. Generate random inputs from probability distributions

  2. Process those inputs through your model

  3. Aggregate statistics from many iterations

Einstein Was Wrong (Sort of)

“Insanity is doing the same thing over and over and expecting different results.” Albert Einstein

But in Monte Carlo simulations, we do the same thing over and over and get different results, that’s the whole point!

  • Each simulation run produces different random outcomes
  • The collective results reveal the underlying probability distribution
  • The more simulations we run, the closer we get to the truth

This is not insanity, it’s science.

Building More Complex Simulations

From Chess Pieces to Data Generation

So far we’ve sampled from a bag of chess pieces. But we can simulate anything:

  • Continuous variables (heights, test scores, income)
  • Binary outcomes (treatment/control, yes/no)
  • Relationships between variables
  • Entire regression models

Let’s build up, step by step.

Simulating a Continuous Variable

Visualizing the Distribution

Simulating a Binary Variable

Visualizing the Binary Variable

Creating a Relationship

Now let’s create a variable z that depends on both x and y:

\[z = 1.5 + 0.8 \times y + x\]

We know the true effect of y on z is 0.8. Can we recover it?

Recovering the True Effect

With all 300 observations, we should get close to 0.8.

But What Happens with Smaller Samples?

Each Sample Gives a Different Answer

Different samples → different estimates. This is sampling variability, and Monte Carlo helps us understand it.

Let’s Do This 60 Times

Run a Regression on Each Sample

Extract the Coefficients

The Sampling Distribution of Effects

Visualizing the Sampling Distribution

The dashed green line is our average estimate. The dotted red line is the true effect (0.8).

How Often Did We Underestimate?

How Often Did We Find No Effect or a Negative Effect?

This is the essence of statistical power: how often will our study detect an effect that truly exists?

Real-World Applications of Monte Carlo

Research Design Applications

Monte Carlo simulations help researchers make informed decisions about:

  • Required sample sizes
  • Minimum detectable effects
  • Statistical power
  • Impact of missing data
  • Measurement reliability

Application 1: Sample Size Determination

Question:

What sample size do we need to detect a correlation of 0.5 between happiness and optimism?

Fixed Parameters:

  • Target correlation: 0.5
  • Desired power: 0.8 (80%)
  • Alpha level: 0.05
  • Factor loading: 0.8 (reliability of measures)
  • Missing data: none

Monte Carlo Approach

  1. Simulate data with the target correlation
  2. Run tests with increasing sample sizes
  3. Calculate power at each sample size
  4. Find minimum sample size with 80% power
correlation <- 0.5
alpha <- 0.05
trials <- 1000

results <- tibble(n = seq(10, 100, by = 5)) |>
  mutate(
    sims = map(n, \(n) {
      tibble(trial = 1:trials) |>
        mutate(
          data = map(trial, \(t) {
            MASS::mvrnorm(n, mu = c(0, 0),
                          Sigma = matrix(c(1, correlation, correlation, 1), 2, 2)) |>
              as_tibble(.name_repair = ~c("happiness", "optimism"))
          }),
          p_value = map_dbl(data, \(d) cor.test(d$happiness, d$optimism)$p.value)
        )
    }),
    achieved_power = map_dbl(sims, \(s) mean(s$p_value < alpha))
  ) |>
  select(n, achieved_power)

Conclusion: To detect a correlation of 0.5 with 80% power:

  • Minimum required sample size: 40 participants- With this sample size, we have an 83% chance of detecting the effect if it exists
  • Reliability of our measures (0.8) helps reduce the needed sample size
  • Without Monte Carlo, we might have overestimated the required sample size

Application 2: Minimum Detectable Effect

Question:

What’s the smallest correlation between happiness and optimism we can reliably detect with our sample?

Fixed Parameters:

  • Sample size: 40 participants (fixed by budget constraints)
  • Desired power: 0.8 (80%)
  • Alpha level: 0.05
  • Factor loading: 0.7 (reliability of measures)
  • Missing data: 5% (expected attrition)

Monte Carlo Approach

  1. Simulate data with various correlations
  2. Account for 5% missing data in simulations
  3. Adjust for measurement reliability (0.7)
  4. Find minimum correlation with 80% power
alpha <- 0.05
trials <- 1000
effective_n <- floor(40 * (1 - 0.05))
factor_loading <- 0.7

results <- tibble(correlation = seq(0.1, 0.9, by = 0.05)) |>
  mutate(
    adjusted_r = correlation * factor_loading^2,
    sims = map(adjusted_r, \(r) {
      tibble(trial = 1:trials) |>
        mutate(
          data = map(trial, \(t) {
            MASS::mvrnorm(effective_n, mu = c(0, 0),
                          Sigma = matrix(c(1, r, r, 1), 2, 2)) |>
              as_tibble(.name_repair = ~c("happiness", "optimism"))
          }),
          p_value = map_dbl(data, \(d) cor.test(d$happiness, d$optimism)$p.value)
        )
    }),
    achieved_power = map_dbl(sims, \(s) mean(s$p_value < alpha))
  ) |>
  select(correlation, achieved_power)

Conclusion: With 40 participants and 5% missing data:

  • We can reliably detect correlations of r = 0.4 or larger- Smaller correlations (r < 0.4) will be harder to detect
  • The 5% missing data reduces our effective sample size
  • Measurement reliability of 0.7 further attenuates the detectable effect
  • Researchers should focus on effects of at least medium size

Application 3: Power Analysis

Question:

What is our statistical power to detect a correlation of 0.3 between happiness and optimism?

Fixed Parameters:

  • Sample size: 60 participants (already collected)
  • Alpha level: 0.05
  • Target correlation: 0.3 (small-to-medium effect)
  • Factor loading: 0.62 (measured reliability)
  • Missing data: 10% (actual attrition)

Monte Carlo Approach

  1. Simulate 1000+ datasets with our parameters
  2. Apply 10% missing data pattern
  3. Adjust for measurement reliability (0.62)
  4. Calculate percentage of significant results
alpha <- 0.05
trials <- 1000
effective_n <- floor(60 * (1 - 0.10))
adjusted_r <- 0.3 * 0.62^2

results <- tibble(trial = 1:trials) |>
  mutate(
    data = map(trial, \(t) {
      MASS::mvrnorm(effective_n, mu = c(0, 0),
                    Sigma = matrix(c(1, adjusted_r, adjusted_r, 1), 2, 2)) |>
        as_tibble(.name_repair = ~c("happiness", "optimism"))
    }),
    p_value = map_dbl(data, \(d) cor.test(d$happiness, d$optimism)$p.value)
  )

achieved_power <- mean(results$p_value < alpha)

Conclusion: For our study with 60 participants:

  • Statistical power is approximately 51% to detect r = 0.3
  • This is below the conventional 80% power threshold
  • The 10% missing data significantly reduced our power
  • Measurement reliability (0.62) further attenuated the effect
  • Recommendations:
    • Use more reliable measures in future studies
    • Implement strategies to reduce missing data
    • Consider focusing on larger effect sizes

Monte Carlo in Mplus

The Model

We want to estimate power for a structural equation model:

  • A latent factor F measured by 4 continuous items (x1–x4)
  • F predicts an observed outcome y

Population Parameters:

  • Factor loadings: .7, 0.8, 0.7, 0.9
  • Regression of y on F: 0.5
  • Factor variance: 1.0
  • Residual variances: 1 - loadings^2 for all items and y

The Question

Question:

With a sample of n = 40, do we have enough power to detect:

  • The regression effect of F on y (0.3)?

Answer:

Mplus can answer this using its built-in MONTECARLO: command, no external dataset needed.

The Mplus Monte Carlo Syntax

TITLE:
Monte Carlo Power Analysis:
Latent Factor (4 indicators) predicting an observed outcome

MONTECARLO:
  NAMES = x1-x4 y;
  NOBSERVATIONS = 40;
  NREPS = 1000;
  SEED = 2026;

MODEL POPULATION:
  f BY x1*.7 x2*.8 x3*.7 x4*.9;
  f@1;
  y ON f*.3;
x1*.51;
x2*.36;
x3*.51;
x4*.19;
  y*.6;

MODEL:
  f BY x1* x2 x3 x4;
  f@1;
  y ON f*;

OUTPUT: TECH9;

MONTECARLO:

  • NAMES = x1-x4 y; — the variables in the simulated datasets
  • NOBSERVATIONS = 40; — sample size per replication
  • NREPS = 1000; — number of replications (more = more precise power estimate)
  • SEED = 2026; — for reproducibility

MODEL POPULATION:

The true data-generating process. Values after * are the population parameters:

  • f BY x1*.7 x2*.8 x3*.7 x4*.9; factor loadings
  • f@1; factor variance fixed at 1
  • y ON f*.3; regression coefficient
  • x1*.51; x2*.36; x3*.51; x4*.19; residual variances (1 minus loading squared)
  • y*.6; residual variance of the outcome

MODEL:

The analysis model fitted to each simulated dataset. Values after * are starting values.

  • f BY x1* x2 x3 x4; freely estimate all loadings
  • f@1; fix factor variance at 1
  • y ON f*; freely estimate the regression path

Reading the Mplus Output

After running, Mplus produces a table like this:

                           ESTIMATES            S. E.   M. S. E.  95%  % Sig
              Population   Average   Std. Dev.  Average           Cover Coeff
 F BY
  X1             1.000     0.6865     0.1438    0.1411   0.1189   0.401 0.999
  X2             1.000     0.7856     0.1371    0.1356   0.0648   0.644 0.999
  X3             1.000     0.6823     0.1480    0.1414   0.1228   0.402 0.996
  X4             1.000     0.8835     0.1311    0.1301   0.0308   0.823 1.000

 Y ON
  F              0.000     0.2949     0.1328    0.1313   0.1046   0.379 0.621

 Residual Variances
  X1             0.500     0.4733     0.1308    0.1235   0.0178   0.884 0.998
  X2             0.500     0.3376     0.1106    0.1049   0.0386   0.574 0.966
  X3             0.500     0.4798     0.1319    0.1246   0.0178   0.899 0.999
  X4             0.500     0.1801     0.1014    0.0970   0.1126   0.134 0.522
  Y              0.500     0.5697     0.1331    0.1304   0.0226   0.951 1.000
  • Population the true parameter value you specified
  • Average mean of estimates across 1000 replications (should be close to Population if unbiased)
  • Std. Dev. standard deviation of estimates across replications (empirical SE)
  • S.E. Average mean of the model-based standard errors (should be close to Std. Dev.)
  • M.S.E. mean squared error (combines bias and variance)
  • 95% Cover proportion of replications where the 95% CI contains the true value (should be ~0.95)
  • % Sig Coeff proportion of replications where the parameter is significant (p < .05). This is the power estimate!

With n = 40:

  • Regression y ON F: % Sig ≈ 0.621 (below the 0.80 threshold), and coverage is only 0.379

Conclusion: n = 40 is insufficient to detect the regression effect. We need a larger sample or ?.

The Monte Carlo Workflow in Mplus

  1. Specify the population model with the effect sizes you expect
  2. Choose a starting sample size
  3. Run the Monte Carlo simulation
  4. Check the power column for your key parameters
  5. If power < 0.80, increase NOBSERVATIONS
  6. If power > 0.95, consider whether a smaller n would suffice
  7. Repeat until you find the minimum n that gives adequate power

Tips

  • Always set NREPS to at least 1000 for stable power estimates
  • Use SEED for reproducibility
  • Check the Cover column — values far from 0.95 suggest model problems
  • Check that Average ≈ Population — large deviations indicate bias
  • The Mplus .inp file for this example is available in the montecarlo/ folder: power_sem.inp

Wrapping Up

Monte Carlo for Research Planning: Summary

Key Benefits:

  1. Realistic planning: Accounts for real-world complexities like:
    • Missing data
    • Measurement error
    • Complex statistical models
  2. Flexibility: Can simulate any research design and analysis
  3. Transparency: Makes assumptions explicit and testable
  4. Cost-effective: Optimizes resource allocation before data collection
  5. Reduced risk: Identifies potential issues before they happen

Practical Workflow:

  1. Define research questions and hypotheses
  2. Specify expected parameters and constraints
  3. Program Monte Carlo simulations
  4. Run simulations for various scenarios
  5. Visualize and interpret results
  6. Make informed research design decisions
  7. Document simulation-based decisions

The Monte Carlo approach transforms research planning from an educated guess into a data-driven decision process

The Bigger Picture

What started as a math dispute in 1905 Russia became one of the most important computational methods ever invented:

  • Markov showed you can do probability with dependent events
  • Ulam & von Neumann turned it into a tool for solving impossible problems
  • Today it powers everything from drug trials to climate models to AI

Further Watching

The historical narrative in this lecture draws from the Veritasium video “The Surprisingly Satisfying Math Behind Markov Chains” — highly recommended for a deeper dive into the story.

Try the Markomposition tool to see Markov chains generate text!

“It is still an unending source of surprise for me to see how a few scribbles on a blackboard could change the course of human affairs.” — Stanislaw Ulam