Computing Sensitivity and Specificity using tidymodels
Sensitivity
Tidymodels
Specificity
Authors
Francisco Cardozo
Kyle Grealis
Published
July 16, 2023
Sensitivity and specificity are two crucial metrics used to evaluate the performance of binary classification models. In this article, we delve into the process of computing these metrics using the ‘tidymodels’ package in R.
We begin by simulating a dataset. Assuming we have access to the true value of the phenomenon under consideration, as well as our model’s predictions, we can proceed to compute the sensitivity and specificity of our model.
Refresh your memory on the meaning of these terms by looking at the following table:
True 1 (Positive)
True 0 (Negative)
Predict 1 (Positive)
TP = 40
FP = 15
Predict 0 (Negative)
FN = 10
TN = 35
This table above is the confusion matrix, featuring the following terms:
TP (True Positive): Instances where the model correctly predicted the positive class.
TN (True Negative): Instances where the model correctly predicted the negative class.
FP (False Positive): Instances where the model incorrectly predicted the positive class.
FN (False Negative): Instances where the model incorrectly predicted the negative class.
However, the burning question remains: how do we calculate these in R? Let’s explore this while evaluating the performance of our models using the ‘tidymodels’ package.
Modeling with tidymodels
Let’s now assume that we have a dataset comprising Y and X values, and we aim to create a model that predicts Y based on X. We’ll use the same dataset we created earlier.
df <- confusion_matrix_long |>uncount(Count) |>rename(y =`True Class`, x =`Predicted Class`) # Rename the columns to shorter names
First, we’ll load the tidymodels library and specify our model.
# Load the tidymodels packagelibrary(tidymodels)
Warning: package 'scales' was built under R version 4.2.3
Warning: package 'recipes' was built under R version 4.2.3
Warning: package 'yardstick' was built under R version 4.2.3
# Define a logistic regression model specificationmodel_spec <-logistic_reg() |>set_engine("glm") |>set_mode("classification")# Define a recipe for preprocessing the datarecipe_glm <-recipe(y ~ x, data = df) |># Convert all nominal variables to dummy variablesstep_dummy(all_nominal(), -all_outcomes())
Let’s examine the dataset we’ve created.
# Preprocess the data using the recipe# This includes converting nominal variables to dummy variablesrecipe_glm |>prep() |># Apply the recipe to new databake(new_data = df) |># View the first few rows of the preprocessed data skimr::skim()
Data summary
Name
bake(prep(recipe_glm), ne…
Number of rows
100
Number of columns
2
_______________________
Column type frequency:
factor
1
numeric
1
________________________
Group variables
None
Variable type: factor
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
y
0
1
FALSE
2
Tru: 50, Tru: 50
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
x_X1
0
1
0.55
0.5
0
0
1
1
1
▆▁▁▁▇
Remeber that our outcome is a binary variable with values True 0 and True 1.
Next, we can create a workflow that combines our recipe and the model specification, and fit the model.
# Define a workflow for fitting the logistic regression modelwr_glm <-workflow() |>add_recipe(recipe_glm) |>add_model(model_spec) # Fit the logistic regression model to the preprocessed datamodel <- wr_glm |>fit(data = df) # Extract the model coefficients and create a summary tablemodel |>extract_fit_parsnip() |>tbl_regression()
Characteristic
log(OR)1
95% CI1
p-value
x_X1
2.2
1.3, 3.2
<0.001
1 OR = Odds Ratio, CI = Confidence Interval
So, we’ve established that the model predicts the Y values based on the X values. But how do we calculate the sensitivity and specificity using the model results?
First, we need to generate predictions. The augment() function from the dplyr package is an excellent choice for this task.
# Load the reactable packagelibrary(reactable)library(knitr)# Use the augment() function to add predicted values to the data framethe_predictions <- model |>augment(df) |># Rename the "y" column to "Observed"rename("Observed"="y")# Create an interactive table using the reactable::reactable() functionthe_predictions |>reactable(groupBy ="Observed",columns =list(x =colDef(aggregate ="unique"),.pred_class =colDef(aggregate ="unique"),`.pred_True 0`=colDef(aggregate ="mean", format =colFormat(digits =2)),`.pred_True 1`=colDef(aggregate ="mean", format =colFormat(digits =2)) ) )
At this point, we have both the predicted class and the observed class. We could use the same method we employed in the earlier demonstration (tbl_cross()), or we can make use of the sensitivity() and specificity() functions from the yardstick package. Let’s look at how this works.
Computing sensitivity and specificity using a cross table
Computing sensitivity and specificity using the yardstick package
Finally! we can use the sensitivity() and specificity() functions from the yardstick package to calculate the sensitivity and specificity of our model.
In the context of logistic regression, R’s default behavior is to take the first level of a factor as the reference category when using the glm function. This behavior becomes particularly important when we’re dealing with binary outcomes, often coded as 0 (absence of the event) and 1 (presence of the event). By default, R will take 0 as the reference level and compare it against 1, due to 0 coming before 1 in numerical order.
However, when we’re working with factors, the order in which they’re arranged is somewhat arbitrary and it doesn’t necessarily make sense to always treat 0 as the absence of an event and 1 as the presence of an event (Kuhn and Silge 2022).
This is from here or page 116 in the tidymodels book.
The yardstick package interprets this setup a bit differently. It views the first factor as the most important, leading it to switch the factor levels in the process. This change in order may affect our sensitivity and specificity measures.
To maintain consistency with our earlier computations, it’s necessary to explicitly set the reference level in the sensitivity() and specificity() functions from the yardstick functions, using the event_level = “second” argument. This ensures that the factor levels are interpreted in a way that aligns with our initial demonstration.
This ‘switching’ behavior in yardstick is also apparent in the collect_metrics() function, making it essential to check the event level. Failure to do so may result in inadvertently switching the event level.
Summary
In this blog post, we’ve walked through the process of computing sensitivity and specificity using the tidymodels package in R, demonstrating it with a simulated dataset. These metrics are indispensable for evaluating the performance of binary classification models. Don’t forget: for the most accurate and realistic results, always evaluate your models using separate test data.