viralpreds {viralmodels} | R Documentation |
Train and Evaluate Many Regression Models for Predicting Viral Load or CD4 Counts
Description
This function builds, trains, and evaluates a set of statistical learning models for predicting viral load or CD4 counts. It implements multiple pre-processing options (simple, normalized, full quadratic) and model types (MARS, neural network, KNN). The best model is selected based on RMSE.
Usage
viralpreds(target, pliegues, repeticiones, rejilla, semilla, data)
Arguments
target |
A character string specifying the column name of the target variable to predict. |
pliegues |
An integer specifying the number of folds for cross-validation. |
repeticiones |
An integer specifying the number of times the cross-validation should be repeated. |
rejilla |
An integer specifying the number of grid search iterations for tuning hyperparameters. |
semilla |
An integer specifying the seed for random number generation to ensure reproducibility. |
data |
A data frame containing the predictors and the target variable. |
Value
A list containing two elements: predictions
(a vector of predicted values for the target variable)
and RMSE
(the root mean square error of the best model).
Examples
library(tidyverse)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral |>
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 5
repeticiones <- 2
rejilla <- 2
semilla <- 123
viralpreds(target, pliegues, repeticiones, rejilla, semilla, traindata)