fastml {fastml}R Documentation

Fast Machine Learning Function

Description

Trains and evaluates multiple classification models.

Usage

fastml(
  data,
  label,
  algorithms = c("xgboost", "random_forest", "svm_radial"),
  test_size = 0.2,
  resampling_method = "cv",
  folds = 5,
  tune_params = NULL,
  metric = "Accuracy",
  n_cores = 1,
  stratify = TRUE,
  impute_method = NULL,
  encode_categoricals = TRUE,
  scaling_methods = c("center", "scale"),
  summaryFunction = NULL,
  seed = 123
)

Arguments

data

A data frame containing the features and target variable.

label

A string specifying the name of the target variable.

algorithms

A vector of algorithm names to use. Default is c("xgboost", "random_forest", "svm_radial"). Use "all" to run all supported algorithms.

test_size

A numeric value between 0 and 1 indicating the proportion of the data to use for testing. Default is 0.2.

resampling_method

A string specifying the resampling method for cross-validation. Default is "cv" (cross-validation). Other options include "none", "boot", "repeatedcv", etc.

folds

An integer specifying the number of folds for cross-validation. Default is 5.

tune_params

A list specifying hyperparameter tuning ranges. Default is NULL.

metric

The performance metric to optimize during training. Default is "Accuracy".

n_cores

An integer specifying the number of CPU cores to use for parallel processing. Default is 1.

stratify

Logical indicating whether to use stratified sampling when splitting the data. Default is TRUE.

impute_method

Method for missing value imputation. Default is NULL.

encode_categoricals

Logical indicating whether to encode categorical variables. Default is TRUE.

scaling_methods

Vector of scaling methods to apply. Default is c("center", "scale").

summaryFunction

A custom summary function for model evaluation. Default is NULL.

seed

An integer value specifying the random seed for reproducibility.

Value

An object of class fastml_model containing the best model, performance metrics, and other information.

Examples


 # Example 1: Using the iris dataset for binary classification (excluding 'setosa')
data(iris)
iris <- iris[iris$Species != "setosa", ]  # Binary classification
iris$Species <- factor(iris$Species)

# Train models
model <- fastml(
  data = iris,
  label = "Species"
)

# View model summary
summary(model)


# Example 2: Using the mtcars dataset for binary classification
data(mtcars)
mtcars$am <- factor(mtcars$am)  # Convert transmission (0 = automatic, 1 = manual) to a factor

# Train models with a different resampling method and specific algorithms
model2 <- fastml(
  data = mtcars,
  label = "am",
  algorithms = c("random_forest", "svm_radial"),
  resampling_method = "repeatedcv",
  folds = 3,
  test_size = 0.25
)

# View model performance
summary(model2)


# Example 3: Using the airquality dataset with missing values
data(airquality)
airquality <- na.omit(airquality)  # Simple example to remove missing values for demonstration
airquality$Month <- factor(airquality$Month)

# Train models with categorical encoding and scaling
model3 <- fastml(
  data = airquality,
  label = "Month",
  encode_categoricals = TRUE,
  scaling_methods = c("center", "scale")
)

# Evaluate and compare models
summary(model3)


# Example 4: Custom hyperparameter tuning for a random forest
data(iris)
iris <- iris[iris$Species != "setosa", ]  # Filter out 'setosa' for binary classification
iris$Species <- factor(iris$Species)
custom_tuning <- list(
  random_forest = expand.grid(mtry = c(1:10))
)

model4 <- fastml(
  data = iris,
  label = "Species",
  algorithms = c("random_forest"),
  tune_params = custom_tuning,
  metric = "Accuracy"
)

# View the results
summary(model4)


[Package fastml version 0.1.0 Index]