simdat {svycdiff} | R Documentation |
Simulate data with varying degrees of selection and confounding bias
Description
Function to simulate data based on specified relationships between the generated (continuous) outcome, variable of interest, confounder, and selection mechanism.
Usage
simdat(
N,
X_dist = "continuous",
S_known = FALSE,
tau_0 = 0,
tau_X = 1,
beta_0 = 0,
beta_A = 1,
beta_X = 1,
hetero = TRUE,
alpha_0 = 0,
alpha_X = 1,
alpha_A = 1,
alpha_AX = 0.1
)
Arguments
N |
int - Number of observations to be generated |
X_dist |
string - Distribution of the confounding variable, X. Defaults to "continuous" for a N(1, 1) variable, or "binary" for a Bernoulli(0.5) variable |
S_known |
boolean - Logical for whether the selection mechanism should be treated as known (deterministic) or needs to be estimated (simulated with Gaussian error; defaults to FALSE) |
tau_0 |
double - Intercept for propensity model (defaults to 0) |
tau_X |
double - Coefficient for X in propensity model (defaults to 1) |
beta_0 |
double - Intercept for selection model (defaults to 0) |
beta_A |
double - Coefficient for A in selection model (defaults to 1) |
beta_X |
double - Coefficient for X in selection model (defaults to 1) |
hetero |
boolean - Logical for heterogeneous treatment effect in the outcome model (defaults to TRUE) |
alpha_0 |
double - Intercept for outcome model (defaults to 0) |
alpha_X |
double - Coefficient for X in outcome model (defaults to 1) |
alpha_A |
double - Coefficient for A in outcome model (defaults to 1) |
alpha_AX |
double - Coefficient for interaction between A and X in
outcome model (only used if |
Details
The data are generated as follows. For a user-given number, N
,
observations in our so-called super population, we first generate a
confounding variable, X
, which relates to our outcome, Y
, our
variable of interest, A
, and our selection indicator, S
.
We generate population-level data with X ~ N(1,1)
or
X ~ Bern(0.5)
depending on whether distribution of X
is
chosen to be X_dist = "continous"
or X_dist = "binary"
,
respectively.
We then generate the remaining data from three models:
- 1. Propensity Model
- 2. Selection Model
- 3. Outcome Model
Value
A data.frame
with N
observations of 7 variables:
- Y
Observed outcome (continuous)
- A
Comparison group variable of interest (binary)
- X
Confounding variable (continuous or binary)
- P_A_cond_X
True probability of A = 1 conditional on X (continuous)
- P_S_cond_AX
True probability of selection (S = 1) conditional on A and X (continuous)
- P_S_cond_A1X
True probability of selection (S = 1) conditional on A = 1 and X (continuous)
- P_S_cond_A0X
True probability of selection (S = 1) conditional on A = 0 and X (continuous)
- CDIFF
True controlled difference in outcomes by comparison group (double)
Examples
N <- 100000
dat <- simdat(N)
head(dat)