add_by_n: Add Counts and Percentages of Missing Data by Group
Description
This function calculates and summarizes the counts and percentages of missing and non-missing values
for a specified variable, grouped by another variable. It provides formatted output for integration
into summary tables.
Usage
add_by_n(data, variable, by, tbl, ...)
Value
A data frame in wide format, where each row represents a group (as defined by by), and columns
include statistics for the target variable (variable) in a formatted string.
Arguments
data
A data frame containing the dataset to analyze.
variable
A character string specifying the target variable for which missing data statistics
will be computed.
by
A character string specifying the grouping variable. The data will be grouped by this variable
before calculating the statistics.
tbl
Not used in the current implementation but retained for compatibility with the gtsummary framework.
...
Additional arguments (not used in the current implementation).
Details
The function performs the following steps:
Groups the data by the variable specified in by.
Computes the number of non-missing values (nb), the number of missing values (nb_NA),
and the percentage of missing values (nb_percent) for the specified variable.
Renames and formats the output columns for clarity and readability.
Converts the data into a wide format suitable for integration into summary tables,
with calculated statistics included in formatted strings (e.g., "value (missing_count ; missing_percent%)").
The output is designed for use with summary tools, such as gtsummary, to display detailed missing
data statistics alongside descriptive statistics.
# Example usage:library(dplyr)
library(tidyr)
data(mtcars)
# Add missing data statistics grouped by 'cyl'add_by_n(
data = mtcars,
variable = "mpg",
by = "cyl")