GEcalib {GECal}R Documentation

Generalized Entropy Calibration

Description

GEcalib computes the calibration weights. Generalized entropy calibration weights maximize the generalized entropy:

H(\bm{\omega}) = -\sum_{i \in A} G(\omega_i),

subject to the calibration constraints \sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i, where A denotes the sample index, and U represents the population index. The auxiliary variables, whose population totals are known, are defined as \bm{z}_i^T = (\bm{x}_i^T, g(d_i)), where g is the first-order derivative of the gerenalized entropy G, and d_i is the design weight for each sampled unit i \in A.

Usage

GEcalib(
  formula,
  dweight,
  data = NULL,
  const,
  method = c("GEC", "GEC0", "DS"),
  entropy = c("SL", "EL", "ET", "CE", "HD", "PH"),
  weight.scale = 1,
  G.scale = 1,
  K_alpha = NULL,
  is.total = TRUE,
  del = NULL
)

Arguments

formula

An object of class "formula" specifying the calibration model.

dweight

A vector of sampling weights.

data

An optional data frame containing the variables in the model (specified by formula).

const

A vector used in the calibration constraint for population totals( or means).

method

The method to be used in calibration. See "Details" for more information.

entropy

The generalized entropy used in calibration, which can be either a numeric value or a string. If numeric, entropy represents the order of Renyi's entropy, where G(\omega) = r^{-1}(r+1)^{-1}\omega^{r+1} if r \neq 0, -1. If a string, valid options include: "SL" (Squared-loss), "EL" (Empirical Likelihood), "ET" (Exponential Tilting), "CE" (Cross-Entropy), "HD" (Hellinger Distance), and "PH" (Pseudo-Huber). See "Summary" for details.

weight.scale

Positive scaling factor for the calibration weights \omega_i. Asymptotics justify setting weight.scale to the finite population correction (fpc = n / N).

G.scale

Positive scaling factor for the generalized entropy function G. Asymptotics justify setting G.scale to the variance of the error term in a linear super-population model.

K_alpha

The K function used in joint optimization when the const of the debiasing covariate g(d_i) is not available. K_alpha can be NULL, "log", or custom functions. See "Details".

is.total

Logical, TRUE if sum(const[1]) equals the population size.

del

The optional threshold (\delta) used when Pseudo-Huber (PH) entropy is selected. del = quantile(dweight, 0.75) if not specified.

Details

The GEcal object returns the calibration weights and necessary information for estimating population totals(or mean).

The terms to the right of the ~ symbol in the formula argument define the calibration constraints. When method == "GEC", the debiasing covariate g(dweight) must be included in the formula. If the population total(mean) of g(dweight) is unavailable, const that corresponds to g(dweight) can be set to NA. In this case, GECalib performs joint optimization over both the calibration weights \omega_i and the missing value of const.

The length of the const vector should match the number of columns in the model.matrix generated by formula. Additionally, the condition number of the model.matrix must exceed .Machine$double.eps to ensure its invertibility.

Both weight.scale and G.scale are positive scaling factors used for calibration. Note that weight.scale is not supported when method == "DS".

Let q_i be the scaling factor for the generalized entropy function G, and \phi_i be the scaling factor for the calibration weights \omega_i.

If method == "GEC", GEcalib minimizes the negative entropy:

\sum_{i \in A} q_iG(\phi_i\omega_i),

with respect to \bm \omega subject to the calibration constraints \sum_{i \in A} \omega_i \bm{z}_i = \sum_{i \in U} \bm{z}_i, where \bm{z}_i^T = (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i)), A denotes the sample index, and U represents the population index.

If method == "GEC", but an element of const corresponding to the debiasing covariate g(d_i) is NA, GEcalib minimizes the negative adjusted entropy:

\sum_{i \in A} q_iG(\phi_i\omega_i) - K(\alpha),

with respect to \bm \omega and \alpha subject to the calibration constraints \sum_{i \in A} \omega_i (\bm{x}_i^T, q_i \phi_i g(\phi_i d_i)) = \left(\sum_{i \in U} \bm x_i, \alpha \right), where the solution \hat \alpha is an estimate of population total for g(d_i). Examples of K(\alpha) includes K(\alpha) = \alpha when K_alpha == NULL, and

K(\alpha) = \left(\sum_{i \in A} d_i g(d_i) + N \right) \log \left| \frac{1}{N}\sum_{i \in A}q_i \phi_i \omega_i g(\phi_i \omega_i) + 1 \right|

when K_alpha == "log".

If method == "GEC0", GEcalib minimizes the negative adjusted entropy:

\sum_{i \in A} q_iG(\phi_i\omega_i) - q_i\phi_i\omega_i g(\phi_i \omega_i)

with respect to \bm \omega subject to the calibration constraints \sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i.

If method == "DS", GEcalib minimizes the divergence between \bm \omega and \bm d:

\sum_{i \in A} q_id_i \tilde G(\omega_i / d_i)

with respect to \bm \omega subject to the calibration constraints \sum_{i \in A} \omega_i \bm{x}_i = \sum_{i \in U} \bm{x}_i. When method == "DS", weight.scale, the scaling factor for the calibration weights \phi_i, is not applicable.

Examples of G and \tilde G are given in "Summary".

Value

A list of class calibration including the calibration weights and data needed for estimation.

Summary

The table below provides a comparison between the GEC and DS methods.

GEC DS
\min_{\bm \omega} \left(-H(\bm \omega)\right) = \sum_{i \in A}G(\omega_i) \quad \quad \min_{\bm \omega} D(\bm \omega, \bm d) = \sum_{i \in A}d_i \tilde G(\omega_i / d_i)
s.t. \sum_{i \in A} \omega_i (\bm{x}_i^T, g(d_i)) = \sum_{i \in U} (\bm{x}_i^T, g(d_i)) s.t. \sum_{i \in A} \omega_i \bm{x}_i^T = \sum_{i \in U} \bm{x}_i^T
G(\omega) = \begin{cases} \frac{1}{r(r+1)} \omega^{r+1} & r \neq 0, -1\\ \omega \log \omega - \omega & r = 0\text{(ET)} \\ -\log \omega & r = -1\text{(EL)} \end{cases} \tilde G(\omega) = \begin{cases} \frac{1}{r(r+1)} \left(\omega^{r+1} - (r+1)\omega + r\right) & r \neq 0, -1 \\ \omega \log \omega - \omega + 1 & r = 0\text{(ET)} \\ -\log \omega + \omega - 1 & r = -1\text{(EL)} \end{cases}

If method == "GEC", further examples include

G(\omega) = (\omega - 1) \log (\omega-1) - \omega \log \omega

when entropy == "CE", and

G(\omega) = \delta^2 \left(1 + (\omega / \delta)^2 \right)^{1/2}

for a threshold \delta when entropy == "PH".

Author(s)

Yonghyun Kwon

References

Kwon, Y., Kim, J., & Qiu, Y. (2024). Debiased calibration estimation using generalized entropy in survey sampling. Arxiv preprint <https://arxiv.org/abs/2404.01076>

Deville, J. C., and Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376-382.

Examples

set.seed(11)
N = 10000
x = data.frame(x1 = rnorm(N, 2, 1), x2= runif(N, 0, 4))
pi = pt((-x[,1] / 2 - x[,2] / 2), 3);
pi = ifelse(pi >.7, .7, pi)

delta = rbinom(N, 1, pi)
Index_S = (delta == 1)
pi_S = pi[Index_S]; d_S = 1 / pi_S
x_S = x[Index_S,]

# Deville & Sarndal(1992)'s calibration using divergence
w1 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
                    const = colSums(cbind(1, x)),
                    entropy = "ET", method = "DS")$w

# Generalized entropy calibration without debiasing covariate 
w2 <- GECal::GEcalib(~ ., dweight = d_S, data = x_S,
                    const = colSums(cbind(1, x)),
                    entropy = "ET", method = "GEC0")$w
all.equal(w1, w2)

# Generalized entropy calibration with debiasing covariate
w3 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
                    const = colSums(cbind(1, x, log(1 / pi))),
                    entropy = "ET", method = "GEC")$w
                    
# Generalized entropy calibration with debiasing covariate
# when its population total is unknown
w4 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
                    const = colSums(cbind(1, x, NA)),
                    entropy = "ET", method = "GEC")$w
all.equal(w1, w4)

w5 <- GECal::GEcalib(~ . + g(d_S), dweight = d_S, data = x_S,
const = colSums(cbind(1, x, NA)),
entropy = "ET", method = "GEC", K_alpha = "log")$w 

[Package GECal version 0.1.5 Index]