topicsTest {topics}R Documentation

Statistically test topics

Description

The function to test the lda model for multiple dimensions, e.g., 2.

Usage

topicsTest(
  data,
  model = NULL,
  preds = NULL,
  ngrams = NULL,
  pred_var_x = NULL,
  pred_var_y = NULL,
  group_var = NULL,
  control_vars = c(),
  test_method = "linear_regression",
  p_alpha = 0.05,
  p_adjust_method = "fdr",
  seed = 42,
  load_dir = NULL,
  save_dir
)

Arguments

data

(tibble) The data to test on

model

(list) The trained model

preds

(tibble) The predictions

ngrams

(list) output of the ngram function

pred_var_x

(string) The x variable name to be predicted, and to be plotted (only needed for regression or correlation)

pred_var_y

(string) The y variable name to be predicted, and to be plotted (only needed for regression or correlation)

group_var

(string) The variable to group by (only needed for t-test)

control_vars

(vector) The control variables (not supported yet)

test_method

(string) The test method to use, either "correlation","t-test", "linear_regression","logistic_regression", or "ridge_regression"

p_alpha

(numeric) Threshold of p value set by the user for visualising significant topics

p_adjust_method

(character) Method to adjust/correct p-values for multiple comparisons (default = "none"; see also "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr").

seed

(integer) The seed to set for reproducibility

load_dir

(string) The directory to load the test from, if NULL, the test will not be loaded

save_dir

(string) The directory to save the test, if NULL, the test will not be saved

Value

A list of the test results, test method, and prediction variable

Examples


# Test the topic document distribution in respect to a variable
save_dir_temp <- tempfile()

dtm <- topicsDtm(
  data = dep_wor_data$Depphrase, 
  save_dir =  save_dir_temp)

model <- topicsModel(
  dtm = dtm, # output of topicsDtm()
  num_topics = 20,
  num_top_words = 10,
  num_iterations = 1000,
  seed = 42,
  save_dir = save_dir_temp)
                     
preds <- topicsPreds(
 model = model, # output of topicsModel()
 data = dep_wor_data$Depphrase,
 save_dir = save_dir_temp)
                     
test <- topicsTest(
  model = model, # output of topicsModel()
  data=dep_wor_data,
  preds = preds, # output of topicsPreds()
  test_method = "linear_regression",
  pred_var_x = "Age",
  save_dir = save_dir_temp)
                 

[Package topics version 0.21.0 Index]