simulate_data {CooRTweet} | R Documentation |
simulate_data
Description
Create a simulated input and output of
detect_groups
function.
Usage
simulate_data(
approx_size = 200,
n_accounts_coord = 5,
n_accounts_noncoord = 4,
n_objects = 5,
min_participation = 3,
time_window = 10,
lambda_coord = NULL,
lambda_noncoord = NULL
)
Arguments
approx_size |
the approximate size of the desired dataset.
It automatically calculates the lambdas passed to |
n_accounts_coord |
the desired number of coordinated accounts. |
n_accounts_noncoord |
the desired number of non-coordinated accounts. |
n_objects |
the desired number of objects. |
min_participation |
the minimum number of repeated coordinated action to define two accounts as coordinated. |
time_window |
the time window of coordination. |
lambda_coord |
|
lambda_noncoord |
|
Details
This function generates a simulated dataset with fixed numbers for coordinated accounts, uncoordinated accounts, and shared objects. The user can set minimum participation and time window parameters and the coordinated accounts will "act" randomly within these restrictions.
The size of the resulting dataset can be adjusted using the approx_size
parameter, and the function will return approximately a dataset of the required
size. Additionally, the size of the dataset can also be adjusted with the
lambda_coord
and lambda_noncoord
parameters. These correspond to the lambda
for the rpois
Poisson distribution used to populate the coordination matrix.
If lambda is between 0.0 and 1.0, the dataset will be smaller compared to
choosing lambdas greater than 1. The approx_size
parameter also serves to
set the lambda of the rpois
function in a more intuitive way.
Value
a list with two data frames: a data frame
with the columns required by the function detect_
coordinated_groups (object_id
, account_id
, content_id
, timestamp_share
)
and the output table of the same
detect_groups function and columns:
object_id
, account_id
, account_id_y
,
content_id
, content_id_y
, time_delta
.
Examples
# Example usage of simulate_data
## Not run:
set.seed(123) # For reproducibility
simulated_data <- simulate_data(
n_accounts_coord = 100,
n_accounts_noncoord = 50,
n_objects = 20,
min_participation = 2,
time_window = 10
)
# Extract input
input_data <- simulated_data[[1]]
# Extract output and keep coordinated actors.
# This is expected correspond to CooRTweet results from `detect_group`
simulated_results <- simulated_data[[2]]
simulated_results <- simulated_results[simulated_results$coordinated == TRUE, ]
simulated_results$coordinated <- NULL
# Run CooRTweet using the input_data and the parameters used for simulation
results <- detect_groups(
x = input_data,
time_window = 10,
min_participation = 2
)
# Sort data tables and check whether they are identical
data.table::setkeyv(simulated_results, names(simulated_results))
data.table::setkeyv(results, names(simulated_results))
identical(results, simulated_results)
## End(Not run)