surveycc {SurveyCC}R Documentation

Canonical correlation analysis for complex survey data

Description

This command extends the functionality of candisc::cancor by calculating the test statistics, degrees of freedom and p-values necessary to estimate and interpret the statistical significance of the secondary canonical corr according to the methods Wilks' lambda, Pillai's trace, and Hotelling-Lawley trace (Caliński et al., 2006) and Roy's largest root (Johnstone, 2009). The units and variables graphs (Gittins, 1986) can also be drawn by surveycc further complementing the information listed by the existing cancor.

Moreover, csdcanon implements an algorithm (Cruz-Cano, Cohen, and Mead-Morse, 2024) that allows the inclusion of complex survey design elements, e.g. strata, cluster and replicate weights, in the estimation of the statistical significance of the canonical correlations. The core idea of the algorithm is to reduce the problem of finding the correlations among the canonical variates and their corresponding statistical significance to calculating an equivalent sequence of univariate linear regression. This switch allows the user to take advantage of the existing theoretical and computational resources that integrate the complex survey design elements into these regression models (Valliant and Dever, 2018). Hence, this algorithm can include the same complex design elements as in survey.

Usage

surveycc(
  design_object,
  var.x,
  var.y,
  howmany = NA,
  dim1 = NA,
  dim2 = NA,
  selection = "FREQ"
)

Arguments

design_object

a survey design object generated from package survey, eg survey::svydesign

var.x

the first set of variables; a vector of names

var.y

the second set of variables; a vector of names

howmany

positive integer; allows the user to choose the number of canonical correlations for which the statistical significance statistics are displayed. Default is to choose the minimum of length(var.x) and length(var.y). Cannot exceed this value.

dim1, dim2

determines which canonical variates serve as the horizontal and vertical axes in the optional plot. NOTE: if dim1 and dim2 not provided, no graph will be displayed.

selection

allows the user to choose whether the type of sample size is equal to the number of rows in the data set or the sum of the survey weights.

Value

A list, containing the canonical correlation object, as well as tables of the various tests of significance. This includes the test statistics, degrees of freedom, and p-values for:

NOTE: For more information on the statistics presented, i.e. test statistic, df1, df2, Chi-Sq/F and p-val, please see the documentation in candisc::cancor for Wilk's Lambda, Pillai's Trace and Hotelling-Lawley Trace (although the present package uses a Chi-squared approximation to the F-distribution), and see the documentation in survey::svyglm for the Weighted/Complex Survey Design regression.

References

Examples

# PATH example
design_object <-
 survey::svrepdesign(
 id = ~PERSONID,
 weights = ~R01_A_PWGT,
 repweights = "R01_A_PWGT[1-9]+",
 type = "Fay",
 rho = 0.3,
 data=reducedPATHdata,
 mse = TRUE
 )
var.x <- c("R01_AC1022", "R01_AE1022", "R01_AG1022CG")
var.y <- c("R01_AX0075", "R01_AX0076")
howmany <- 2
dim1 <- 1
dim2 <- 2
surveycc(design_object, var.x, var.y, howmany = howmany,
  dim1 = dim1, dim2 = dim2, selection = "x")

# NYTS example
design_object <-
  survey::svydesign(
  ids = ~psu2,
  nest = TRUE,
  strata = ~v_stratum2,
  weights = ~finwgt,
  data = reducedNYTS2021data
)
var.x <- c("qn9", "qn38", "qn40", "qn53", "qn54", "qn64", "qn69", "qn74",
           "qn76", "qn78", "qn80", "qn82", "qn85", "qn88", "qn89")
var.y <- c("qn128", "qn129", "qn130", "qn131", "qn132", "qn134")
howmany <- 3
surveycc(design_object = design_object, var.x = var.x,
  var.y = var.y, howmany = howmany, selection = "x")


[Package SurveyCC version 0.1.1 Index]