cat_lmm_initialization {catalytic} | R Documentation |
Initialization for Catalytic Linear Mixed Model (LMM)
Description
This function prepares and initializes a catalytic linear mixed model by processing input data, extracting necessary variables, generating synthetic datasets, and fitting a model. (Only consider one random effect variance)
Usage
cat_lmm_initialization(
formula,
data,
x_cols,
y_col,
z_cols,
group_col = NULL,
syn_size = NULL,
resample_by_group = FALSE,
resample_only = FALSE,
na_replace = mean
)
Arguments
formula |
A formula specifying the model. Should include response and predictor variables. |
data |
A data frame containing the data for modeling. |
x_cols |
A character vector of column names for fixed effects (predictors). |
y_col |
A character string for the name of the response variable. |
z_cols |
A character vector of column names for random effects. |
group_col |
A character string for the grouping variable (optional). If not given (NULL), it is extracted from the formula. |
syn_size |
An integer specifying the size of the synthetic dataset to be generated, default is length(x_cols) * 4. |
resample_by_group |
A logical indicating whether to resample by group, default is FALSE. |
resample_only |
A logical indicating whether to perform resampling only, default is FALSE. |
na_replace |
A function to replace NA values in the data, default is mean. |
Value
A list containing the values of all the input arguments and the following components:
-
Function Information:
-
function_name
: A character string representing the name of the function, "cat_lmm_initialization". -
simple_model
: An object of classlme4::lmer
orstats::lm
, representing the fitted model for generating synthetic response from the original data.
-
-
Observation Data Information:
-
obs_size
: An integer representing the number of observations in the original dataset. -
obs_data
: The original data used for fitting the model, returned as a data frame. -
obs_x
: A data frame containing the standardized predictor variables from the original dataset. -
obs_y
: A numeric vector of the standardized response variable from the original dataset. -
obs_z
: A data frame containing the standardized random effect variables from the original dataset. -
obs_group
: A numeric vector representing the grouping variable for the original observations.
-
-
Synthetic Data Information:
-
syn_size
: An integer representing the number of synthetic observations generated. -
syn_data
: A data frame containing the synthetic dataset, combining synthetic predictor and response variables. -
syn_x
: A data frame containing the synthetic predictor variables. -
syn_y
: A numeric vector of the synthetic response variable values. -
syn_z
: A data frame containing the synthetic random effect variables. -
syn_group
: A numeric vector representing the grouping variable for the synthetic observations. -
syn_x_resample_inform
: A data frame containing information about the resampling process for synthetic predictors:Coordinate: Preserves the original data values as reference coordinates during processing.
Deskewing: Adjusts the data distribution to reduce skewness and enhance symmetry.
Smoothing: Reduces noise in the data to stabilize the dataset and prevent overfitting.
Flattening: Creates a more uniform distribution by modifying low-frequency categories in categorical variables.
Symmetrizing: Balances the data around its mean to improve statistical properties for model fitting.
-
syn_z_resample_inform
: A data frame containing information about the resampling process for synthetic random effects. The resampling methods are the same as those fromsyn_x_resample_inform
.
-
-
Whole Data Information:
-
size
: An integer representing the total size of the combined original and synthetic datasets. -
data
: A combined data frame of the original and synthetic datasets. -
x
: A combined data frame of the original and synthetic predictor variables. -
y
: A combined numeric vector of the original and synthetic response variables. -
z
: A combined data frame of the original and synthetic random effect variables. -
group
: A combined numeric vector representing the grouping variable for both original and synthetic datasets.
-
Examples
data(mtcars)
cat_init <- cat_lmm_initialization(
formula = mpg ~ wt + (1 | cyl), # formula for simple model
data = mtcars,
x_cols = c("wt"), # Fixed effects
y_col = "mpg", # Response variable
z_cols = c("disp", "hp", "drat", "qsec", "vs", "am", "gear", "carb"), # Random effects
group_col = "cyl", # Grouping column
syn_size = 100, # Synthetic data size
resample_by_group = FALSE, # Resampling option
resample_only = FALSE, # Resampling method
na_replace = mean # NA replacement method
)
cat_init