add_by_n: Add Counts and Percentages of Missing Data by Group

Description

This function calculates and summarizes the counts and percentages of missing and non-missing values for a specified variable, grouped by another variable. It provides formatted output for integration into summary tables.

Usage

add_by_n(data, variable, by, tbl, ...)

Value

A data frame in wide format, where each row represents a group (as defined by by), and columns include statistics for the target variable (variable) in a formatted string.

Arguments

data: A data frame containing the dataset to analyze.
variable: A character string specifying the target variable for which missing data statistics will be computed.
by: A character string specifying the grouping variable. The data will be grouped by this variable before calculating the statistics.
tbl: Not used in the current implementation but retained for compatibility with the gtsummary framework.
...: Additional arguments (not used in the current implementation).

Details

The function performs the following steps:

Groups the data by the variable specified in by.
Computes the number of non-missing values (nb), the number of missing values (nb_NA), and the percentage of missing values (nb_percent) for the specified variable.
Renames and formats the output columns for clarity and readability.
Converts the data into a wide format suitable for integration into summary tables, with calculated statistics included in formatted strings (e.g., "value (missing_count ; missing_percent%)").

The output is designed for use with summary tools, such as gtsummary, to display detailed missing data statistics alongside descriptive statistics.

Examples

Run this code

# Example usage:
library(dplyr)
library(tidyr)
data(mtcars)

# Add missing data statistics grouped by 'cyl'
add_by_n(
  data = mtcars,
  variable = "mpg",
  by = "cyl"
)

Run the code above in your browser using DataLab