biv_compare {sampcompR} | R Documentation |
Compare multiple data frames on a bivariate level and plot them together.
biv_compare(
dfs,
benchmarks,
variables = NULL,
corrtype = "r",
data = TRUE,
id = NULL,
weight = NULL,
strata = NULL,
id_bench = NULL,
weight_bench = NULL,
strata_bench = NULL,
p_value = NULL,
p_adjust = NULL,
varlabels = NULL,
plot_title = NULL,
plots_label = NULL,
diff_perc = TRUE,
diff_perc_size = 4.5,
perc_diff_transparance = 0,
note = FALSE,
order = NULL,
breaks = NULL,
colors = NULL,
mar = c(0, 0, 0, 0),
grid = "white",
gradient = FALSE,
sum_weights = NULL,
missings_x = TRUE,
remove_nas = "pairwise",
ncol_facet = 3,
nboots = 0,
boot_all = FALSE,
parallel = FALSE,
adjustment_weighting = "raking",
adjustment_vars = NULL,
raking_targets = NULL,
post_targets = NULL,
percentile_ci = TRUE
)
dfs |
A character vector containing the names of data frames to compare
against the |
benchmarks |
A character vector containing the names of benchmarks to
compare the |
variables |
A character vector that containes the names of the variables for
the comparison. If it is |
corrtype |
A character string, indicating the type of the bivariate correlation. It can either be "r" for Pearson's r or "rho" for Spearman's "rho". At the moment, rho is only applicable to unweighted data. |
data |
If |
strata , strata_bench |
A character vector that determines strata variables
that are used to weigh the |
id_bench , id |
A character vector determining id variables used to weigh
the |
weight_bench , weight |
A character vector that determines variables to weigh
the |
p_value |
A number between zero and one to determine the maximum significance niveau. |
p_adjust |
Can be either |
varlabels |
A character string or vector of character strings containing the new names of variables that is used in the plot. |
plot_title |
A character string containing the title of the plot. |
plots_label |
A character string or vector of character strings containing the new names of the data frames that are used in the plot. |
diff_perc |
If |
diff_perc_size |
A number to determine the size of the displayed percental difference between surveys in the plot. |
perc_diff_transparance |
A number to determine the transparency of the displayed percental difference between surveys in the plot. |
note |
If |
order |
A character vector to determine in which order the variables should be displayed in the plot. |
breaks |
A vector to label the color scheme in the legend. |
colors |
A vector to determine the colors in the plot. |
mar |
A vector that determines the margins of the plot. |
grid |
A color string, that determines the color of the lines between the tiles of the heatmap. |
gradient |
If |
sum_weights |
A vector containing information for every variable to weigh them in the displayed percental-difference calculation. It can be used if some variables are over- or underrepresented in the analysis. |
missings_x |
If |
remove_nas |
A character string, that indicates how missing values should be
removed, can either be |
ncol_facet |
The number of columns used in faced_wrap() for the plots. |
nboots |
A numeric value indicating the number of bootstrap replications.
If |
boot_all |
If TURE, both, dfs and benchmarks will be bootstrapped. Otherwise the benchmark estimate is assumed to be constant. |
parallel |
Can be either |
adjustment_weighting |
A character vector indicating if adjustment
weighting should be used. It can either be |
adjustment_vars |
Variables used to adjust the survey when using raking or post-stratification. |
raking_targets |
A list of raking targets that can be given to the rake
function of |
post_targets |
A list of post_stratification targets that can be given to
the |
percentile_ci |
If TURE, cofidence intervals will be calculated using the percentile method. If False, they will be calculated using the normal method. |
The plot shows a heatmap of a correlation matrix, where the colors are determined by the similarity of the Pearson's r values in both sets of respondents. Leaving default breaks and colors,
Same
(green) indicates, that the Pearson's r correlation is not significant > 0 in
the related data frame or benchmark or the Pearson's r correlations are not significantly
different, between data frame and benchmark.
Small Diff
(yellow) indicates that the Pearson's r
correlation is significant > 0 in the related data frame or benchmark and the Pearson's r
correlations are significantly different, between data frame and benchmark.
Large Diff
(red) indicates, that the same conditions of yellow are fulfilled, and
the correlations are either in opposite directions,or one is double the size of the other.
A object generated with the help of ggplot2::ggplot2()
visualizes
the differences between the data frames and benchmarks. If data = TRUE
instead of the plot a list will be returned containing information of the
analyses. This biv_compare
object can be used in
plot_biv_compare
to build a plot, or in biv_compare_table
,
to get a table.
## Get Data for comparison
data("card")
north <- card[card$south==0,]
white <- card[card$black==0,]
## use the function to plot the data
bivar_comp<-sampcompR::biv_compare(dfs = c("north","white"),
benchmarks = c("card","card"),
variables= c("age","educ","fatheduc","motheduc","wage","IQ"),
data=FALSE)
bivar_comp