Learn R Programming

CooRTweet (version 2.1.2)

simulate_data: simulate_data

Description

Create a simulated input and output of detect_groups function.

Usage

simulate_data(
  approx_size = 200,
  n_accounts_coord = 5,
  n_accounts_noncoord = 4,
  n_objects = 5,
  min_participation = 3,
  time_window = 10,
  lambda_coord = NULL,
  lambda_noncoord = NULL
)

Value

a list with two data frames: a data frame with the columns required by the function detect_ coordinated_groups (object_id, account_id, content_id, timestamp_share) and the output table of the same detect_groups function and columns: object_id, account_id, account_id_y, content_id, content_id_y, time_delta.

Arguments

approx_size

the approximate size of the desired dataset. It automatically calculates the lambdas passed to rpois(), which is the expected rate of occurrences. It only works when lambda_coord and lambda_noncoord are NULL (default).

n_accounts_coord

the desired number of coordinated accounts.

n_accounts_noncoord

the desired number of non-coordinated accounts.

n_objects

the desired number of objects.

min_participation

the minimum number of repeated coordinated action to define two accounts as coordinated.

time_window

the time window of coordination.

lambda_coord

lambda parameter for coordinated accounts passed to rpois(), which is the expected rate of occurrences (higher lambda means more coordinated shares).

lambda_noncoord

lambda parameter for non-coordinated accounts passed to rpois(), which is the expected rate of occurrences (higher lambda means more non-coordinated shares).

Details

This function generates a simulated dataset with fixed numbers for coordinated accounts, uncoordinated accounts, and shared objects. The user can set minimum participation and time window parameters and the coordinated accounts will "act" randomly within these restrictions.

The size of the resulting dataset can be adjusted using the approx_size parameter, and the function will return approximately a dataset of the required size. Additionally, the size of the dataset can also be adjusted with the lambda_coord and lambda_noncoord parameters. These correspond to the lambda for the rpois Poisson distribution used to populate the coordination matrix. If lambda is between 0.0 and 1.0, the dataset will be smaller compared to choosing lambdas greater than 1. The approx_size parameter also serves to set the lambda of the rpois function in a more intuitive way.

Examples

Run this code
# Example usage of simulate_data
if (FALSE) {
set.seed(123) # For reproducibility
simulated_data <- simulate_data(
  n_accounts_coord = 100,
  n_accounts_noncoord = 50,
  n_objects = 20,
  min_participation = 2,
  time_window = 10
)

# Extract input
input_data <- simulated_data[[1]]

# Extract output and keep coordinated actors.
# This is expected correspond to CooRTweet results from `detect_group`
simulated_results <- simulated_data[[2]]
simulated_results <- simulated_results[simulated_results$coordinated == TRUE, ]
simulated_results$coordinated <- NULL

# Run CooRTweet using the input_data and the parameters used for simulation
results <- detect_groups(
  x = input_data,
  time_window = 10,
  min_participation = 2
)

# Sort data tables and check whether they are identical
data.table::setkeyv(simulated_results, names(simulated_results))
data.table::setkeyv(results, names(simulated_results))

identical(results, simulated_results)
}

Run the code above in your browser using DataLab