Learn R Programming

eventreport (version 0.1.1)

mean_dscore: Calculate the mean divergence scores across event reports

Description

This function calculates the mean divergence score for one or more variables grouped by an event identifier. The divergence score captures how often values for a given variable differ across event reports describing the same event.

Usage

mean_dscore(data, group_var, variables, normalize = FALSE, plot = FALSE)

Value

Either a tibble or a ggplot object, depending on the value of plot. If plot = FALSE, returns a tibble with two columns:

variable

The name of each variable.

dscore

The mean divergence score or normalized score.

If plot = TRUE, returns a lollipop-style plot showing divergence scores by variable.

Arguments

data

A data frame containing event report level data.

group_var

A character string naming the column that uniquely identifies events (e.g., "event_id").

variables

A character vector of column names to compute divergence scores for.

normalize

Logical, indicating whether to normalize the scores by the total number of unique values for each variable.

plot

Logical, indicating whether to return a ggplot object visualizing the scores.

Details

For each variable and event, the function computes the number of unique values reported, subtracts one, and averages these values across all events. This reflects how much inconsistency exists across sources. Optionally, the scores can be normalized by the total number of unique values observed for each variable across the dataset. The result is a long-format dataframe showing which variables are most sensitive to aggregation. A plotting option is also available.

Examples

Run this code
df <- data.frame(
  event_id = c(1, 1, 2, 2, 3),
  country = c("US", "US", "UK", "UK", "CA"),
  actor1 = c("Actor A", "Actor B", "Actor B", "Actor C", "Actor D"),
  deaths_best = c(10, 20, 5, 15, 10)
)
mean_dscore(df, "event_id", c("country", "actor1", "deaths_best"), normalize = TRUE, plot = TRUE)

Run the code above in your browser using DataLab