visdat (version 0.5.3)

vis_miss: Visualise a data.frame to display missingness.

Description

vis_miss provides an at-a-glance ggplot of the missingness inside a dataframe, colouring cells according to missingness, where black indicates a missing cell and grey indicates a present cell. As it returns a ggplot object, it is very easy to customize and change labels.

Usage

vis_miss(x, cluster = FALSE, sort_miss = FALSE, show_perc = TRUE,
  show_perc_col = TRUE, large_data_size = 9e+05,
  warn_large_data = TRUE)

Value

ggplot2 object displaying the position of missing values in the dataframe, and the percentage of values missing and present.

Arguments

x

a data.frame

cluster

logical. TRUE specifies that you want to use hierarchical clustering (mcquitty method) to arrange rows according to missingness. FALSE specifies that you want to leave it as is. Default value is FALSE.

sort_miss

logical. TRUE arranges the columns in order of missingness. Default value is FALSE.

show_perc

logical. TRUE now adds in the % of missing/complete data in the whole dataset into the legend. Default value is TRUE.

show_perc_col

logical. TRUE adds in the % missing data in a given column into the x axis. Can be disabled with FALSE. Default value is TRUE.

large_data_size

integer default is 900000, this can be changed. See note for more details

warn_large_data

logical - warn if there is large data? Default is TRUE see note for more details

See Also

vis_dat() vis_guess() vis_expect() vis_cor() vis_compare()

Examples

Run this code

vis_miss(airquality)

if (FALSE) {
vis_miss(airquality, cluster = TRUE)

vis_miss(airquality, sort_miss = TRUE)

# if you have a large dataset, you might want to try downsampling:
library(nycflight13)
library(dplyr)
flights %>%
  sample_n(1000) %>%
  vis_miss()

flights %>%
  slice(1:1000) %>%
  vis_miss()

}

Run the code above in your browser using DataLab