generate_demo_data {immunaut}R Documentation

Generate a Demo Dataset with Specified Number of Clusters and Overlap

Description

This function generates a demo dataset with a specified number of subjects, features, and desired number of clusters, ensuring that the generated clusters are not too far apart and have some degree of overlap to simulate real-world data. The generated dataset includes demographic information (outcome, age, and gender), as well as numeric features with a specified probability of missing values.

Usage

generate_demo_data(
  n_subjects = 1000,
  n_features = 200,
  missing_prob = 0.1,
  desired_number_clusters = 3,
  cluster_overlap_sd = 15
)

Arguments

n_subjects

Integer. The number of subjects (rows) to generate. Defaults to 1000.

n_features

Integer. The number of features (columns) to generate. Defaults to 200.

missing_prob

Numeric. The probability of introducing missing values (NA) in the feature columns. Defaults to 0.1.

desired_number_clusters

Integer. The approximate number of clusters to generate in the feature space. Defaults to 3.

cluster_overlap_sd

Numeric. The standard deviation to control cluster overlap. Defaults to 15 for more overlap.

Details

The function generates n_features numeric columns based on Gaussian clusters with some overlap between clusters to simulate more realistic data. Missing values are introduced in each feature column based on the missing_prob.

Value

A data frame containing the generated demo dataset, with columns:

Examples


# Generate a demo dataset with 1000 subjects, 200 features, and 3 clusters
demo_data <- generate_demo_data(n_subjects = 1000, n_features = 200, 
                                desired_number_clusters = 3, 
                                cluster_overlap_sd = 15, missing_prob = 0.1)

# View the first few rows of the dataset
head(demo_data)



[Package immunaut version 1.0.1 Index]