Efficiently select and replace (or add) a subset of columns from (to) a data frame. This can be done by data type, or using expressions, column names, indices, logical vectors, selector functions or regular expressions matching column names.
## Select and replace variables, analgous to dplyr::select but significantly faster
fselect(x, …, return = "data")
fselect(x, …) <- value
slt(x, …, return = "data") # Shortcut for fselect
slt(x, …) <- value # Shortcut for fselect<-## Select and replace columns by names, indices, logical vectors,
## regular expressions or using functions to identify columns
get_vars(x, vars, return = "data", regex = FALSE, …)
gv(x, vars, return = "data", …) # Shortcut for get_vars
gvr(x, vars, return = "data", …) # Shortcut for get_vars(\dots, regex = TRUE)
get_vars(x, vars, regex = FALSE, …) <- value
gv(x, vars, …) <- value # Shortcut for get_vars<-
gvr(x, vars, …) <- value # Shortcut for get_vars<-(\dots, regex = TRUE)
## Add columns at any position within a data.frame
add_vars(x, …, pos = "end")
add_vars(x, pos = "end") <- value
av(x, …, pos = "end") # Shortcut for add_vars
av(x, pos = "end") <- value # Shortcut for add_vars<-
## Select and replace columns by data type
num_vars(x, return = "data")
num_vars(x) <- value
nv(x, return = "data") # Shortcut for num_vars
nv(x) <- value # Shortcut for num_vars<-
cat_vars(x, return = "data") # Categorical variables, see is_categorical
cat_vars(x) <- value
char_vars(x, return = "data")
char_vars(x) <- value
fact_vars(x, return = "data")
fact_vars(x) <- value
logi_vars(x, return = "data")
logi_vars(x) <- value
date_vars(x, return = "data") # See is_date
date_vars(x) <- value
a data frame.
a data frame or list of columns whose dimensions exactly match those of the extracted subset of x
. If only 1 variable is in the subset of x
, value
can also be an atomic vector or matrix, provided that NROW(value) == nrow(x)
.
a vector of column names, indices (can be negative), a suitable logical vector, or a vector of regular expressions matching column names (if regex = TRUE
). It is also possible to pass a function returning TRUE
or FALSE
when applied to the columns of x
.
an integer or string specifying what the selector function should return. The options are:
Int. | String | Description | ||
1 | "data" | subset of data frame (default) | ||
2 | "names" | column names | ||
3 | "indices" | column indices | ||
4 | "named_indices" | named column indices | ||
5 | "logical" | logical selection vector | ||
6 | "named_logical" | named logical vector |
Note: replacement functions only replace data, However column names are replaced together with the data (if available).
logical. TRUE
will do regular expression search on the column names of x
using a (vector of) regular expression(s) passed to vars
. Matching is done using grep
.
the position where columns are added in the data frame. "end"
(default) will append the data frame at the end (right) side. "front" will add columns in front (left). Alternatively one can pass a vector of positions (matching length(value)
if value is a list). In that case the other columns will be shifted around the new ones while maintaining their order.
for fselect
: column names and expressions e.g. fselect(mtcars, newname = mpg, hp, carb:vs)
. for get_vars
: further arguments passed to grep
, if regex = TRUE
. For add_vars
: Same as value
, a single argument passed may also be a vector or matrix, multiple arguments must each be a list (they are combined using c(…)
).
get_vars(<-)
is around 2x faster than `[.data.frame`
and 8x faster than `[<-.data.frame`
, so the common operation data[cols] <- someFUN(data[cols])
can be made 10x more efficient (abstracting from computations performed by someFUN
) using get_vars(data, cols) <- someFUN(get_vars(data, cols))
or the shorthand gv(data, cols) <- someFUN(gv(data, cols))
.
Similarly type-wise operations like data[sapply(data, is.numeric)]
or data[sapply(data, is.numeric)] <- value
are facilitated and more efficient using num_vars(data)
and num_vars(data) <- value
or the shortcuts nv
and nv<-
etc.
fselect
provides an efficient alternative to dplyr::select
, allowing the selection of variables based on expressions evaluated within the data frame, see Examples. It is about 100x faster than dplyr::select
but also more simple as it does not provide special methods for grouped tibbles.
Finally, add_vars(data1, data2, data3, …)
is a lot faster than cbind(data1, data2, data3, …)
, and preserves the attributes of data1
(i.e. it is like adding columns to data1
). The replacement function add_vars(data) <- someFUN(get_vars(data, cols))
efficiently appends data
with computed columns. The pos
argument allows adding columns at positions other than the end (right) of the data frame, see Examples.
All functions introduced here perform their operations class-independent. They all basically work like this: (1) save the attributes of x
, (2) unclass x
, (3) subset, replace or append x
as a list, (4) modify the "names" component of the attributes of x
accordingly and (5) efficiently attach the attributes again to the result from step (3).
Thus they can freely be applied to data.table's, grouped tibbles, panel data frames and other classes and will return an object of exactly the same class and the same attributes.
fsubset
, ftransform
, Data Frame Manipulation, Collapse Overview
# NOT RUN {
## Wold Development Data
head(fselect(wlddev, Country = country, Year = year, ODA)) # Fast dplyr-like selecting
head(fselect(wlddev, -country, -year, -PCGDP))
head(fselect(wlddev, country, year, PCGDP:ODA))
head(fselect(wlddev, -(PCGDP:ODA)))
fselect(wlddev, country, year, PCGDP:ODA) <- NULL # Efficient deleting
head(wlddev)
rm(wlddev)
head(num_vars(wlddev)) # Select numeric variables
head(cat_vars(wlddev)) # Select categorical (non-numeric) vars
head(get_vars(wlddev, is_categorical)) # Same thing
num_vars(wlddev) <- num_vars(wlddev) # Replace Numeric Variables by themselves
get_vars(wlddev,is.numeric) <- get_vars(wlddev,is.numeric) # Same thing
head(get_vars(wlddev, 9:12)) # Select columns 9 through 12, 2x faster
head(get_vars(wlddev, -(9:12))) # All except columns 9 through 12
head(get_vars(wlddev, c("PCGDP","LIFEEX","GINI","ODA"))) # Select using column names
head(get_vars(wlddev, "[[:upper:]]", regex = TRUE)) # Same thing: match upper-case var. names
head(gvr(wlddev, "[[:upper:]]")) # Same thing
get_vars(wlddev, 9:12) <- get_vars(wlddev, 9:12) # 9x faster wlddev[9:12] <- wlddev[9:12]
add_vars(wlddev) <- STD(gv(wlddev,9:12), wlddev$iso3c) # Add Standardized columns 9 through 12
head(wlddev) # gv and av are shortcuts
get_vars(wlddev, 14:17) <- NULL # Efficient Deleting added columns again
av(wlddev, "front") <- STD(gv(wlddev,9:12), wlddev$iso3c) # Again adding in Front
head(wlddev)
get_vars(wlddev, 1:4) <- NULL # Deleting
av(wlddev,c(10,12,14,16)) <- W(wlddev,~iso3c, cols = 9:12, # Adding next to original variables
keep.by = FALSE)
head(wlddev)
get_vars(wlddev, c(10,12,14,16)) <- NULL # Deleting
# }
Run the code above in your browser using DataLab