Learn R Programming

inspectdf (version 0.0.2)

inspect_na: Summarise and compare the rate of missingness in one or two dataframes.

Description

Summarise and compare the rate of missingness in one or two dataframes.

Usage

inspect_na(df1, df2 = NULL, show_plot = FALSE)

Arguments

df1

A data frame

df2

An optional second data frame for making columnwise comparison of missingness. Defaults to NULL.

show_plot

(Deprecated) Logical flag indicating whether a plot should be shown. Superseded by the function show_plot() and will be dropped in a future version.

Value

A tibble summarising the count and percentage of columnwise missingness for one or a pair of data frames.

Details

When a single data frame is specified, the a tibble is returned which contains the count and percentage of missing values, with column names

  • col_name character vector containing column names of df1.

  • cnt integer vector containing the number of missing values by column.

  • pcnt the percentage of each column with missing values

When both df1 and df2 are specified, missingness is compared across columns occurring in both data frames. A test of the null hypothesis that the rate of missingness is the same across the same column in either dataframe.

  • col_name the name of the columns occurring in either df1

  • cnt_1, cnt_2 pair of integer vectors containing counts of missing entries for each column in df1 and df2.

  • pcnt_1, pcnt_2 pair of columns containing percentage of missing entries for each column in df1 and df2.

  • p_value p-value associated with test of the rates of missingness. Small values indicate evidence that the rate of missingness differs for a column occurring in both df1 and df2.

Examples

Run this code
# NOT RUN {
data("starwars", package = "dplyr")
# inspect missingness in starwars data
inspect_na(starwars)
# compare two dataframes
inspect_na(starwars, starwars[1:30, ])
# }

Run the code above in your browser using DataLab