rake_wt {AuxSurvey}R Documentation

Weighted or Unweighted Raking Estimator

Description

This function estimates the weighted or unweighted raking adjustment for survey data. Raking adjusts the sample weights to match the marginal distributions of auxiliary variables in the population. It supports both weighted and unweighted estimations for a variety of outcome variables, including Gaussian (continuous) and Binomial (binary) outcomes.

Usage

rake_wt(
  svysmpl,
  svypopu,
  auxVars,
  svyVar,
  subset = NULL,
  family = gaussian(),
  invlvls,
  weights = NULL,
  maxiter = 50
)

Arguments

svysmpl

A dataframe or tibble representing the sample data (samples). This should contain the outcome variable and any auxiliary variables.

svypopu

A dataframe or tibble representing the population data (population). This is used to compute the finite population correction (FPC) for raking.

auxVars

A character vector containing the names of auxiliary variables to be used for raking. These variables will be used to adjust the weights.

svyVar

The outcome variable for which the raking estimate is calculated.

subset

A character vector representing filtering conditions to select subsets of the sample and population. Default is NULL, in which case the analysis is performed on the entire dataset. If subsets are specified, estimates for both the whole data and the subsets will be calculated.

family

The distribution family of the outcome variable. Supported options are: gaussian for continuous outcomes and binomial for binary outcomes.

invlvls

A numeric vector specifying the confidence levels for the raking estimators. If more than one value is provided, multiple CIs will be calculated.

weights

A numeric vector of case weights. The length should match the number of cases in svysmpl. These weights are used in the weighted raking adjustment.

maxiter

An integer specifying the maximum number of iterations for the raking algorithm. Default is 50.

Value

A list where each element contains the raking estimate and confidence intervals (CIs) for a subset or the entire dataset. The list includes: - est: The raking estimate for the outcome variable. - se: The standard error of the estimate. - tCI: Confidence intervals for the estimate. - sample_size: The sample size for the subset or entire dataset. - population_size: The population size, if provided, including the finite population correction (FPC).

Examples

## Simulate data with nonlinear association (setting 3).
data = simulate(N = 3000, discretize = 3, setting = 3, seed = 123)
population = data$population  # Population data (3000 cases)
samples = data$samples        # Sample data (600 cases)
ipw = 1 / samples$true_pi    # Compute inverse probability weights

## Perform weighted raking with auxiliary variables
auxVars = c("Z1", "Z2", "Z3")
Weighted_rake = rake_wt(svysmpl = samples, svypopu = population, auxVars = auxVars,
                        svyVar = "Y1", subset = NULL, family = gaussian(),
                        invlvls = c(0.95), weights = ipw, maxiter = 50)
Weighted_rake

## Perform unweighted raking
Unweighted_rake = rake_wt(svysmpl = samples, svypopu = population, auxVars = auxVars,
                          svyVar = "Y1", subset = NULL, family = gaussian(),
                          invlvls = c(0.95), weights = NULL, maxiter = 50)
Unweighted_rake


[Package AuxSurvey version 0.9 Index]