simdat {svycdiff}R Documentation

Simulate data with varying degrees of selection and confounding bias

Description

Function to simulate data based on specified relationships between the generated (continuous) outcome, variable of interest, confounder, and selection mechanism.

Usage

simdat(
  N,
  X_dist = "continuous",
  S_known = FALSE,
  tau_0 = 0,
  tau_X = 1,
  beta_0 = 0,
  beta_A = 1,
  beta_X = 1,
  hetero = TRUE,
  alpha_0 = 0,
  alpha_X = 1,
  alpha_A = 1,
  alpha_AX = 0.1
)

Arguments

N

int - Number of observations to be generated

X_dist

string - Distribution of the confounding variable, X. Defaults to "continuous" for a N(1, 1) variable, or "binary" for a Bernoulli(0.5) variable

S_known

boolean - Logical for whether the selection mechanism should be treated as known (deterministic) or needs to be estimated (simulated with Gaussian error; defaults to FALSE)

tau_0

double - Intercept for propensity model (defaults to 0)

tau_X

double - Coefficient for X in propensity model (defaults to 1)

beta_0

double - Intercept for selection model (defaults to 0)

beta_A

double - Coefficient for A in selection model (defaults to 1)

beta_X

double - Coefficient for X in selection model (defaults to 1)

hetero

boolean - Logical for heterogeneous treatment effect in the outcome model (defaults to TRUE)

alpha_0

double - Intercept for outcome model (defaults to 0)

alpha_X

double - Coefficient for X in outcome model (defaults to 1)

alpha_A

double - Coefficient for A in outcome model (defaults to 1)

alpha_AX

double - Coefficient for interaction between A and X in outcome model (only used if hetero == TRUE; defaults to 0.1)

Details

The data are generated as follows. For a user-given number, N, observations in our so-called super population, we first generate a confounding variable, X, which relates to our outcome, Y, our variable of interest, A, and our selection indicator, S. We generate population-level data with X ~ N(1,1) or X ~ Bern(0.5) depending on whether distribution of X is chosen to be X_dist = "continous" or X_dist = "binary", respectively.

We then generate the remaining data from three models:

1. Propensity Model
2. Selection Model
3. Outcome Model

Value

A data.frame with N observations of 7 variables:

Y

Observed outcome (continuous)

A

Comparison group variable of interest (binary)

X

Confounding variable (continuous or binary)

P_A_cond_X

True probability of A = 1 conditional on X (continuous)

P_S_cond_AX

True probability of selection (S = 1) conditional on A and X (continuous)

P_S_cond_A1X

True probability of selection (S = 1) conditional on A = 1 and X (continuous)

P_S_cond_A0X

True probability of selection (S = 1) conditional on A = 0 and X (continuous)

CDIFF

True controlled difference in outcomes by comparison group (double)

Examples


N <- 100000

dat <- simdat(N)

head(dat)


[Package svycdiff version 0.1.1 Index]