allocations {sps} | R Documentation |
Generate a proportional-to-size allocation for stratified sampling.
prop_allocation(x, N, strata, initial = 0, divisor = function(a) a + 1)
expected_coverage(x, N, strata, alpha = 1e-4)
x |
A positive and finite numeric vector of sizes for units in the population (e.g., revenue for drawing a sample of businesses). |
N |
A positive integer giving the total sample size across all strata. Non-integers are truncated towards 0. |
strata |
A factor, or something that can be coerced into one, giving the strata associated with units in the population. The default is to place all units into a single stratum. |
initial |
A positive integer vector giving the initial (or minimal) allocation for each stratum, ordered according to the levels of |
divisor |
A divisor function for the divisor (highest-averages) apportionment method. The default uses the Jefferson (D'Hondt) method. See details for other possible functions. |
alpha |
A number between 0 and 1 such that units with inclusion probabilities greater than or equal to 1 - |
The prop_allocation()
function gives a sample size for each stratum that is proportional to the sum of x
across strata and adds up to N
. This is done using the divisor (highest-averages) apportionment method (Balinksi and Young, 1982, Appendix A), for which there are a number of different divisor functions:
Jefferson/D'Hondt | \(a) a + 1 |
Webster/Sainte-Laguë | \(a) a + 0.5 |
Imperiali | \(a) a + 2 |
Huntington-Hill | \(a) sqrt(a * (a + 1)) |
Danish | \(a) a + 1 / 3 |
Adams | \(a) a |
Dean | \(a) a * (a + 1) / (a + 0.5)
|
Note that a divisor function with d(0) = 0
(i.e., Huntington-Hill, Adams, Dean) should have an initial allocation of at least 1 for all strata. In all cases, ties are broken according to the levels of strata
; reordering the levels of strata
can therefore result in a different allocation.
In cases where the number of units in a stratum is smaller than its allocation, the allocation for that stratum is set to the number of available units, with the remaining sample size reallocated to other strata proportional to x
. This is similar to PROC SURVEYSELECT
in SAS with ALLOC = PROPORTIONAL
.
Passing a single integer for the initial allocation first checks that recycling this value for each stratum does not result in an allocation larger than the sample size. If it does, then the value is reduced so that recycling does not exceed the sample size. This recycled vector can be further reduced in cases where it exceeds the number of units in a stratum, the result of which is the initial allocation. This special recycling ensures that the initial allocation is feasible.
The expected_coverage()
function gives the average number of strata covered by ordinary Poisson sampling without stratification. As sequential and ordinary Poisson sampling have the same sample size on average, this gives an approximation for the coverage under sequential Poisson sampling. This function can also be used to calculate, e.g., the expected number of enterprises covered within a stratum when sampling business establishments.
prop_allocation()
returns a named integer vector of sample sizes for each stratum in strata
.
expected_coverage()
returns the expected number of strata covered by the sample design.
Balinksi, M. L. and Young, H. P. (1982). Fair Representation: Meeting the Ideal of One Man, One Vote. Yale University Press.
sps
for stratified sequential Poisson sampling.
strAlloc
in the PracTools package for other allocation methods.
# Make a population with units of different size
x <- c(rep(1:9, each = 3), 100, 100, 100)
# ... and 10 strata
s <- rep(letters[1:10], each = 3)
# Should get about 7 to 8 strata in a sample on average
expected_coverage(x, 15, s)
# Generate an allocation with all 10
prop_allocation(x, 15, s, initial = 1)