seqaddNA {seqimpute}R Documentation

Generation of missing on longitudinal categorical data.

Description

Generation of missing data under the form of gaps, which is the typical form of missing data with longitudinal data. It simulates MCAR or MAR missing data.

Usage

seqaddNA(
  data,
  var = NULL,
  states.high = NULL,
  propdata = 1,
  pstart.high = 0.1,
  pstart.low = 0.005,
  maxgap = 3,
  maxprop = 0.75,
  only.traj = FALSE
)

Arguments

data

A data frame containing sequences of a categorical (multinomial) variable, where missing data are coded as NA.

var

A vector specifying the columns of the dataset that contain the trajectories. Default is NULL, meaning all columns are used.

states.high

A list of states with a higher probability of initiating a subsequent missing data gap.

propdata

Proportion of observations for which missing data is simulated, as a decimal between 0 and 1.

pstart.high

Probability of starting a missing data gap for the states specified in the states.high argument.

pstart.low

Probability of starting a missing data gap for all other states.

maxgap

Maximum length of a missing data gap.

maxprop

Maximum proportion of missing data allowed in a sequence, as a decimal between 0 and 1. If the proportion exceeds this value, the simulation is rerun for the sequence.

only.traj

Logical, if TRUE, only the trajectories (specified in var) are returned. If FALSE, the entire data frame is returned.

Value

A data frame with simulated missing data.

Author(s)

Kevin Emery

Examples

# Generate MCAR missing data on the mvad dataset 
# from the TraMineR package

## Not run: 
data(mvad, package = "TraMineR")
mvad.miss <- seqaddNA(mvad, var = 17:86)


# Generate missing data on mvad where joblessness is more likely to trigger 
# a missing data gap
mvad.miss2 <- seqaddNA(mvad, var = 17:86,  states.high = "joblessness")

## End(Not run)


[Package seqimpute version 2.1.0 Index]