Learn R Programming

CooRTweet (version 1.5.0)

detect_similar_text: detect_similar_text

Description

This function detects coordinated cotweets, i.e. pairs of social media posts that are similar in terms of their text and were posted within a short time window.

Usage

detect_similar_text(
  x,
  min_repetition = 2,
  time_window = 10,
  min_similarity = 0.8,
  similarity_function = textreuse::jaccard_similarity,
  tokenizer = textreuse::tokenize_ngrams,
  minhash_seed = NULL,
  minhash_n = 200
)

Value

A data.table with the following columns:

  • content_id: The ID of the first post

  • content_id_y: The ID of the second post

  • id_user: The ID of the user who shared the first post

  • id_user_y: The ID of the user who shared the second post

  • timestamp_share: The timestamp when the first post was shared

  • timestamp_share_y: The timestamp when the second post was shared

  • similarity_score: The similarity score between the two posts

  • time_delta: The time difference between the two posts

Arguments

x

A data.table with the following columns:

  • content_id: The ID of the content (e.g. a tweet ID)

  • object_id: The text of the social media post

  • id_user: The ID of the user who shared the content

  • timestamp_share: The timestamp when the content was shared

min_repetition

the minimum number of repeated coordinated actions a user has to perform (defaults to 2 times)

time_window

The maximum time difference between two posts in order for them to be considered coordinated cotweets (defaults to 10 seconds).

min_similarity

The minimum similarity score between two posts in order for them to be considered coordinated cotweets (defaults to 0.8).

similarity_function

The function that is used to calculate the similarity between two tweets. The default function is Jaccard Similarity (see: jaccard_similarity).

tokenizer

The function that is used to tokenize the text of the tweets. The default function is the tokenize_ngrams function.

minhash_seed

The seed that is used to generate the minhash signatures. If NULL, a random seed will be used.

minhash_n

The number of minhash signatures that are used (see textreuse package for details).

Details

Uses the textreuse package to compare each post with each other and determine their text similarity. Use the reshape_tweets() function with intent = "cotweet" parameter to prepare your data.