This is a data.table
method for the S3 generic stats::na.omit
. The internals are written in C for speed. See examples for benchmark timings.
bit64::integer64
type is also supported.
# S3 method for data.table
na.omit(object, cols=seq_along(object), invert=FALSE, ...)
A data.table
.
A vector of column names (or numbers) on which to check for missing values. Default is all the columns.
logical. If FALSE
omits all rows with any missing values (default). TRUE
returns just those rows with missing values instead.
Further arguments special methods could require.
A data.table with just the rows where the specified columns have no missing value in any of them.
The data.table
method consists of an additional argument cols
, which when specified looks for missing values in just those columns specified. The default value for cols
is all the columns, to be consistent with the default behaviour of stats::na.omit
.
It does not add the attribute na.action
as stats::na.omit
does.
# NOT RUN { DT = data.table(x=c(1,NaN,NA,3), y=c(NA_integer_, 1:3), z=c("a", NA_character_, "b", "c")) # default behaviour na.omit(DT) # omit rows where 'x' has a missing value na.omit(DT, cols="x") # omit rows where either 'x' or 'y' have missing values na.omit(DT, cols=c("x", "y")) # } # NOT RUN { # Timings on relatively large data set.seed(1L) DT = data.table(x = sample(c(1:100, NA_integer_), 5e7L, TRUE), y = sample(c(rnorm(100), NA), 5e7L, TRUE)) system.time(ans1 <- na.omit(DT)) ## 2.6 seconds system.time(ans2 <- stats:::na.omit.data.frame(DT)) ## 29 seconds # identical? check each column separately, as ans2 will have additional attribute all(sapply(1:2, function(i) identical(ans1[[i]], ans2[[i]]))) ## TRUE # }
Run the code above in your browser using DataCamp Workspace