rf_domain_score {viraldomain} | R Documentation |
Calculate the Random Forest Model Domain Applicability Score
Description
This function fits a Random Forest model to the provided data and computes a domain applicability score based on PCA distances.
Usage
rf_domain_score(
featured_col,
train_data,
rf_hyperparameters,
test_data,
threshold_value
)
Arguments
featured_col |
A character string specifying the name of the response variable to predict. |
train_data |
A data frame containing predictor variables and the response variable for training the model. |
rf_hyperparameters |
A list of hyperparameters for the Random Forest model, including:
|
test_data |
A data frame for making predictions. |
threshold_value |
A numeric threshold value used for computing domain applicability scores. |
Details
Random Forest creates a large number of decision trees, each independent of the others. The final prediction combines the predictions from all individual trees. This function uses the ranger
engine for fitting regression models.
Value
A data frame containing the computed domain applicability scores for each observation in the test dataset.
Examples
set.seed(123)
library(dplyr)
featured_col <- "cd_2022"
train_data <- viral %>%
dplyr::select(cd_2022, vl_2022)
test_data <- sero
rf_hyperparameters <- list(mtry = 2, min_n = 5, trees = 500)
threshold_value <- 0.99
# Call the function