CalDF {Frames2} | R Documentation |
Produces estimates for population totals and means using the DF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
CalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL,
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL,
xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of length |
domains_B |
A character vector of length |
N_A |
(Optional) A numeric value indicating the size of frame A. |
N_B |
(Optional) A numeric value indicating the size of frame B. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain. |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
DF calibration estimator of population total is given by
\hat{Y}_{CalDF} = \hat{Y}_a + \hat{\eta}\hat{Y}_{ab} + \hat{Y}_b + (1 - \hat{\eta})\hat{Y}_{ba}
where \hat{Y}_a = \sum_{i \in s_a}\tilde{d}_i y_i, \hat{Y}_{ab} = \sum_{i \in s_{ab}}\tilde{d}_i y_i
,
\hat{Y}_b = \sum_{i \in s_b}\tilde{d}_i y_i
and \hat{Y}_{ba} = \sum_{i \in s_{ba}}\tilde{d}_i y_i
, with \tilde{d}_i
calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if N_A, N_B
and N_{ab}
are all known and no other auxiliary information is available, calibration constraints are
\sum_{i \in s_a}\tilde{d}_i = N_a, \sum_{i \in s_{ab}}\tilde{d}_i = N_{ab}, \sum_{i \in s_{ba}}\tilde{d}_i = N_{ba}, \sum_{i \in s_b}\tilde{d}_i = N_b
Optimal value for \hat{\eta}
to minimice variance of the estimator is given by \hat{V}(\hat{N}_{ba})/(\hat{V}(\hat{N}_{ab}) + \hat{V}(\hat{N}_{ba}))
. If both first and second order probabilities are known, variances are estimated using function VarHT
.
If only first order probabilities are known, variances are estimated using Deville's method.
Function covers following scenarios:
There is not any additional auxiliary variable
N_A, N_B
and N_{ab}
unknown
N_A
and N_B
known and N_{ab}
unknown
N_{ab}
known and N_A
and N_B
unknown
N_A, N_B
and N_{ab}
known
At least, information about one additional auxiliary variable is available
N_A
and N_B
known and N_{ab}
unknown
N_{ab}
known and N_A
and N_B
unknown
N_A, N_B
and N_{ab}
known
To obtain an estimator of the variance for this estimator, one can use Deville's expression
\hat{V}(\hat{Y}_{CalDF}) = \frac{1}{1-\sum_{k\in s} a_k^2}\sum_{k\in s}(1-\pi_k)\left(\frac{e_k}{\pi_k} - \sum_{l\in s} a_{l} \frac{e_l}{\pi_l}\right)^2
where a_k=(1-\pi_k)/\sum_{l\in s} (1-\pi_l)
and e_k
are the residuals of the regression with auxiliary variables as regressors.
CalDF
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]
Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382
data(DatA)
data(DatB)
data(PiklA)
data(PiklB)
#Let calculate DF calibration estimator for variable Feeding, without
#considering any auxiliary information
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)
#Now, let calculate DF calibration estimator for variable Clothing when the frame
#sizes and the overlap domain size are known
CalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain,
N_A = 1735, N_B = 1191, N_ab = 601)
#Finally, let calculate DF calibration estimator and a 90% confidence interval
#for population total for variable Feeding, considering Income as auxiliary variable in
#frame A and Metres2 as auxiliary variable in frame B and with frame sizes and overlap
#domain size known.
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain,
N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc,
xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553,
conf_level = 0.90)