Prevention Science and Community Health
Prevention Science and Community Health
July 30, 2024
“Public health is the art and science of preventing disease, prolonging life, and promoting health through the organized efforts of society” - (Winslow 1920)
“Public health is primarily concerned with the health of the entire population, rather than the health of individuals. Its features include an emphasis on the promotion of health and the prevention of disease and disability; the collection and use of epidemiological data, population surveillance, and other forms of empirical quantitative assessment; a recognition of the multidimensional nature of the determinants of health; and a focus on the complex interactions of many factors - biological, behavioral, social, and environmental - in developing effective interventions.” - (Childress et al. 2002)
Scoping review of definitions (Azari and Borisch 2023)
What are the underlying causes of a health problem? For example, what are the causes of youth alcohol initiation?
What are the risk factors associated with a health problem? Which demographic and behavioral factors increase the likelihood of developing type 2 diabetes among adults?
I suggest these three concepts are at the heart of effective prediction:
Bonus: Machine learning is optimized for prediction.
#| standalone: true
#| viewerHeight: 500
library(shiny)
library(glmnet)
# Define UI for application
ui <- fluidPage(
titlePanel("Effect of L1 Penalty (Lasso) on Regression Slope"),
sidebarLayout(
sidebarPanel(
sliderInput("penalty",
"Penalty (Lambda):",
min = 0,
max = 1,
value = 0.1,
step = 0.01)
),
mainPanel(
plotOutput("regressionPlot"),
verbatimTextOutput("modelSummary")
)
)
)
# Define server logic
server <- function(input, output) {
output$regressionPlot <- renderPlot({
# Generate synthetic data
set.seed(123)
x <- matrix(rnorm(200), ncol = 2)
y <- 3 * x[,1] + 2 * x[,2] + rnorm(100)
# Fit model with Lasso penalty (alpha = 1) using both columns of x
fit <- glmnet(x, y, alpha = 1, lambda = input$penalty)
# Extract coefficients
intercept <- coef(fit)[1]
slope1 <- coef(fit)[2]
slope2 <- coef(fit)[3]
# Calculate predicted values using intercept and slopes
y_pred <- intercept + x[,1] * slope1 + x[,2] * slope2
# Combine data for plotting
data <- data.frame(X1 = x[,1], Y = y, Y_Pred = y_pred)
# Plot data and regression line
par(mfrow = c(1, 2))
# Plot 1: Regression line
plot(data$X1, data$Y, main = "Lasso Regression Line (X1 vs Y)",
xlab = "X1", ylab = "Y", pch = 19, col = "red")
abline(intercept, slope1, col = "blue", lwd = 2)
# Plot 2: Actual vs Predicted values
plot(data$X1, data$Y, main = "Actual vs Predicted (X1 vs Y)",
xlab = "X1", ylab = "Y", pch = 19, col = "red")
points(data$X1, data$Y_Pred, col = "blue", pch = 19)
par(mfrow = c(1, 1))
})
output$modelSummary <- renderPrint({
# Generate synthetic data
set.seed(123)
x <- matrix(rnorm(200), ncol = 2)
y <- 3 * x[,1] + 2 * x[,2] + rnorm(100)
# Fit model with Lasso penalty (alpha = 1)
fit <- glmnet(x, y, alpha = 1, lambda = input$penalty)
# Display coefficients including intercept
intercept <- coef(fit)[1]
slope1 <- coef(fit)[2]
slope2 <- coef(fit)[3]
cat("Intercept:", intercept, "\n")
cat("Slope for X1:", slope1, "\n")
cat("Slope for X2:", slope2, "\n")
})
}
# Run the application
shinyApp(ui = ui, server = server)Bias: Error due to overly simplistic assumptions. Variance: Error due to overly complex models. The goal is to minimize the total error.
Let’s simulate a data set and apply machine learning techniques to predict the outcome.
Estimate these models in a few cases. Then progressively increase the sample size by one unit. Calculate the training and testing errors each time.
A linear problem - linear regression.

Non-linear problem - linear regression.

Non-linear problem - Support Vector Machine.

A linear problem - Random Forest.

Why Machine Learning