This function detects coordinated cotweets, i.e. pairs of social media posts that are similar in terms of their text and were posted within a short time window.
detect_similar_text(
x,
min_repetition = 2,
time_window = 10,
min_similarity = 0.8,
similarity_function = textreuse::jaccard_similarity,
tokenizer = textreuse::tokenize_ngrams,
minhash_seed = NULL,
minhash_n = 200
)
A data.table with the following columns:
content_id: The ID of the first post
content_id_y: The ID of the second post
id_user: The ID of the user who shared the first post
id_user_y: The ID of the user who shared the second post
timestamp_share: The timestamp when the first post was shared
timestamp_share_y: The timestamp when the second post was shared
similarity_score: The similarity score between the two posts
time_delta: The time difference between the two posts
A data.table with the following columns:
content_id: The ID of the content (e.g. a tweet ID)
object_id: The text of the social media post
id_user: The ID of the user who shared the content
timestamp_share: The timestamp when the content was shared
the minimum number of repeated coordinated actions a user has to perform (defaults to 2 times)
The maximum time difference between two posts in order for them to be considered coordinated cotweets (defaults to 10 seconds).
The minimum similarity score between two posts in order for them to be considered coordinated cotweets (defaults to 0.8).
The function that is used to calculate the similarity between two tweets. The default function is Jaccard Similarity (see: jaccard_similarity).
The function that is used to tokenize the text of the tweets. The default function is the tokenize_ngrams function.
The seed that is used to generate the minhash signatures. If NULL, a random seed will be used.
The number of minhash signatures that are used (see textreuse
package for details).
Uses the textreuse package
to compare each post with each other and determine their text similarity. Use the
reshape_tweets()
function with intent = "cotweet"
parameter to prepare your data.