knn_domain_score {viraldomain}R Documentation

Calculate the K-Nearest Neighbor model domain applicability score

Description

This function fits a K-Nearest Neighbor (KNN) model to the provided data and computes a domain applicability score based on PCA distances.

Usage

knn_domain_score(
  featured_col,
  train_data,
  knn_hyperparameters,
  test_data,
  threshold_value
)

Arguments

featured_col

The name of the response variable to predict.

train_data

The training dataset containing predictor variables and the response variable.

knn_hyperparameters

A list of hyperparameters for the KNN model, including:

  • neighbors: The number of neighbors to consider.

  • weight_func: The weight function to use.

  • dist_power: The distance power parameter.

test_data

The test dataset for making predictions.

threshold_value

The threshold value used for computing domain scores.

Value

A data frame containing the computed domain scores for each observation in the test dataset.

Examples

set.seed(123)
library(dplyr)
featured_col <- "cd_2022"
# Specifying features for training and testing procedures
train_data = viral |>
  dplyr::select(cd_2022, vl_2022)
test_data = sero 
knn_hyperparameters <- list(neighbors = 5, weight_func = "optimal", dist_power = 0.3304783)
threshold_value <- 0.99
# Call the function

[Package viraldomain version 0.0.6 Index]