Learn R Programming

RastaRocket (version 1.0.1)

add_by_n: Add Counts and Percentages of Missing Data by Group

Description

This function calculates and summarizes the counts and percentages of missing and non-missing values for a specified variable, grouped by another variable. It provides formatted output for integration into summary tables.

Usage

add_by_n(data, variable, by, tbl, ...)

Value

A data frame in wide format, where each row represents a group (as defined by by), and columns include statistics for the target variable (variable) in a formatted string.

Arguments

data

A data frame containing the dataset to analyze.

variable

A character string specifying the target variable for which missing data statistics will be computed.

by

A character string specifying the grouping variable. The data will be grouped by this variable before calculating the statistics.

tbl

Not used in the current implementation but retained for compatibility with the gtsummary framework.

...

Additional arguments (not used in the current implementation).

Details

The function performs the following steps:

  1. Groups the data by the variable specified in by.

  2. Computes the number of non-missing values (nb), the number of missing values (nb_NA), and the percentage of missing values (nb_percent) for the specified variable.

  3. Renames and formats the output columns for clarity and readability.

  4. Converts the data into a wide format suitable for integration into summary tables, with calculated statistics included in formatted strings (e.g., "value (missing_count ; missing_percent%)").

The output is designed for use with summary tools, such as gtsummary, to display detailed missing data statistics alongside descriptive statistics.

Examples

Run this code
# Example usage:
library(dplyr)
library(tidyr)
data(mtcars)

# Add missing data statistics grouped by 'cyl'
add_by_n(
  data = mtcars,
  variable = "mpg",
  by = "cyl"
)

Run the code above in your browser using DataLab