
In the Monte Carlo method, a new sample is created by randomly selecting values from the original sample using replacement to create a sample of the same size. Statistics can then be computed from this new sample. This process is then repeated many times to create an estimate of the population statistic. (B. Efron 1979)
The exact version for Bootstrap case resampling is similar to the Monte Carlo method, except every possible enumeration of the initial sample is created. The downside to this method is since there are a total of \({2n - 1 \choose n} = \frac{(2n - 1)!}{n!(n - 1)!}\) possible samples, the process can be very intensive for large sample sizes. (Hossain 2000)

| Name | Type |
|---|---|
| Entity | Nominal |
| Population | Discrete |
| Number_of_Deaths | Discrete |
| Deaths_CardiovascularDiseases | Nominal |
| Deaths_InterpersonalViolence | Discrete |
| Deaths_CardiovascularDiseasesRate | Continuous |
Average rate of deaths caused by cardiovascular diseases is 32.73%
Standard deviation of 13.09%
# Create a histogram of the cardiovascular diseases death rate
cardioHistogram <- ggplot(boot_data,
aes(x = Deaths_CardiovascularDiseasesRate)) +
xlab('Cardiovascular Diseases Death Rate') +
ylab('Count') +
geom_histogram()
# Plot the interpersonal violence death rate against population
violenceHistogram <- ggplot(boot_data,
aes(x = Deaths_InterpersonalViolence,
y = Population)) +
scale_x_log10() +
scale_y_log10() +
xlab('Interpersonal Violence Deaths (log 10)') +
ylab('Population (log 10)') +
geom_point()
# Display plots side by side
grid.arrange(cardioHistogram, violenceHistogram, ncol = 2)Monte Carlo Method
1,000 Bootstrap samples
Estimating Population Mean and Standard Deviation
Analyzing Regression Model Variability
Sample size \(n = 20\)
# Create vectors to store new sample means and standard deviations
boot_estimated_means <- rep()
boot_estimated_sds <- rep()
# Create 1,000 new samples and save the means and standard deviations
for (x in 1:1000) {
boot_new_sample <- sample(boot_initial_sample, 20, replace = TRUE)
boot_estimated_means <- append(boot_estimated_means,
pull(
summarize(
boot_data[boot_new_sample,],
mean(Deaths_CardiovascularDiseasesRate))))
boot_estimated_sds <- append(boot_estimated_sds,
pull(
summarize(
boot_data[boot_new_sample,],
sd(Deaths_CardiovascularDiseasesRate))))
}# Sort the estimated means from smallest to largest
boot_estimated_means <- sort(boot_estimated_means)
# Sort the estimated standard deviations from smallest to largest
boot_estimated_sds <- sort(boot_estimated_sds)
# Trim the top and bottom 2.5%
start = length(boot_estimated_means) * 0.025
end = length(boot_estimated_means) * 0.975
boot_estimated_means <- boot_estimated_means[start:end]
boot_estimated_sds <- boot_estimated_sds[start:end]Select first and last values to find confidence intervals
Compare against true population mean and standard deviation
Estimated population mean
Estimated population standard deviation
# Create vectors to store new intercepts and regression parameters
boot_estimated_intercepts <- rep()
boot_estimated_regressionparameters <- rep()
# Create 1,000 new samples and save the intercepts and parameters
for (x in 1:1000) {
boot_new_reg_sample <- sample(boot_initial_sample, 20, replace = TRUE)
boot_new_lm <- lm(Deaths_InterpersonalViolence ~ Population,
boot_data[boot_new_reg_sample,])
boot_estimated_intercepts <- append(boot_estimated_intercepts,
boot_new_lm$coefficients[1])
boot_estimated_regressionparameters <-
append(boot_estimated_regressionparameters,
boot_new_lm$coefficients[2])
}# Create population model
pop_lm <- lm(Deaths_InterpersonalViolence ~ Population, boot_data)
# Create initial sample model
sample_lm <- lm(Deaths_InterpersonalViolence ~ Population,
boot_data[boot_initial_sample,])
# Find Bootstrapping average values
boot_lm_intercept <- mean(boot_estimated_intercepts)
boot_lm_x1 <- mean(boot_estimated_regressionparameters)| Population | Sample | Bootstrap | |
|---|---|---|---|
| (Intercept) | 1183.7273 | -1801.4852 | -1500.0582 |
| x | 2.4^{-5} | 1.74^{-4} | 1.54^{-4} |
# Plot the data, population model, sample model, and final Bootstrap model
# (average of all models)
final_plot <- ggplot(aes(x = Population, y = Deaths_InterpersonalViolence),
data = boot_data) +
geom_point() +
geom_abline(intercept = coef(pop_lm)[1],
slope = coef(pop_lm)[2],
color= 'blue') +
geom_abline(intercept = coef(sample_lm)[1],
slope = coef(sample_lm)[2],
color = 'green') +
geom_abline(intercept = mean(boot_estimated_intercepts),
slope = mean(boot_estimated_regressionparameters),
color = 'red') +
xlab('Population') +
ylab('Interpersonal Violence Deaths') +
ggtitle('Regression Comparison') +
labs(color = "Model")
# Add all Bootstrap models
for (x in 1:length(boot_estimated_intercepts)) {
final_plot <- final_plot +
geom_abline(intercept = boot_estimated_intercepts[x],
slope = boot_estimated_regressionparameters[x],
alpha = 0.025)
}
# Display final plot
final_plotIn the first experiment, the population mean and standard deviation were estimated using the Bootstrap method and it was found that the confidence interval for both the mean and standard deviation fell within the confidence intervals and the population had a normal distribution.
In the second experiment, the simple linear regression model to estimate the number of deaths caused by interpersonal violence on the country population was analyzed for variability in the model. It was found that the Bootstrap model usually follows the sample model more closely than the full population model.