Optimizing Machine Learning Models through Sample Size Variation in Training Subsets • BreakNBuild

Overview

BreakNBuild is designed to evaluate model performance through progressively sampled training data. It offers a structured way to analyze how a model’s accuracy, error, or other metrics evolve as the amount of data increases. This iterative sampling approach is particularly useful for identifying bias-variance trade-offs, diagnosing overfitting or underfitting, and understanding how much data is needed to achieve optimal model performance. With BreakNBuild, users can visualize learning curves, helping to fine-tune algorithms, assess generalization, and debug machine learning models efficiently.

Features

Progressive Data Splitting: partition your dataset into training and validation subsets.
Customizable Sample Sizes: Control the size of your training data to understand model performance under different conditions.
Easy Integration: Built on the rsample package, BreakNBuild seamlessly integrates with the tidymodels framework.

![man/figures/schema_progressive_splits.svg]

Installation

To install the latest version from GitHub, use:

# install.packages("devtools")
devtools::install_github("https://github.com/focardozom/BreakNBuild")

Usage

Here’s a quick example to get you started:

library(BreakNBuild)

splits <- progressive_splits(data, validation_size = 0.2, start_size = 10)

This will create a splits object that you can use to train your model using the tidymodels ecosystem for Machine Learning.

For more details on how to use the BreakNBuild package, please refer to the package vignette.

BreakNBuild: Optimize Machine Learning Models with Dynamic Data Splits

Overview

Features

Installation

Usage