setorder: Fast reordering of a data.table by reference

Description

In data.table parlance, all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other data.table operator that modifies input by reference is :=. Check out the See Also section below for other set* function data.table provides.

setcolorder reorders the columns of data.table, by reference, to the new order provided. setorder (and setorderv) reorders the rows of a data.table by reference, based on the columns provided. It can sort in both ascending and descending order. The functionality is identical to using ?order on a data.frame, except much faster, very memory efficient and much more user-friendly. x[order(.)] is now optimised internally to use data.table's fast order by default. data.table by default always sorts in C-locale. If instead, it is essential to sort by the session locale, one could always revert back to base's order by doing: x[base:::order(.)]. bit64:::integer64 type is also supported for reordering rows of a data.table.

Usage

setcolorder(x, neworder)
setorder(x, ..., na.last=FALSE)
setorderv(x, cols, order=1L, na.last=FALSE)
# optimised to use data.table's internal fast order
# x[order(., na.last=TRUE)]

Arguments

A data.table.

neworder

Character vector of the new column name ordering. May also be column numbers.

...

The columns to sort by. Do not quote column names. If ... is missing (ex: setorder(x)), x is rearranged based on all columns in ascending order by default. To sort by a column in descending order prefix a "-"

cols

A character vector of column names of x, to which to order by. Do not add "-" here.

order

An integer vector with only possible values of 1 and -1, corresponding to ascending and descending order. The length of order must be either 1 or equal to that of cols. If length(order

na.last

logical. If TRUE, missing values in the data are placed last; if FALSE, they are placed first; if NA they are removed. na.last=NA is valid only for x[order(., na.last)] and it's default is <

`Value`

For all set* functions, the input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]. If you require a copy, take a copy first (using DT2 = copy(DT)). See ?copy.

`Details`

When it's required to reorder the columns of a data.table, the idiomatic way is to use setcolorder(x, neworder), instead of doing x <- x[, neworder, with=FALSE]. This is because the latter makes an entire copy of the data.table, which maybe unnecessary in most situations. setcolorder also allows column numbers instead of column names for neworder argument, although it isn't good programming practice to use column numbers. We recommend using column names.
  
  data.table internally implements extremely fast radix based ordering. However, in versions <= 1.9.2,="" fast="" ordering="" was="" only="" capable="" of="" increasing="" order="" (ascending).="" in="" versions="">1.9.2, the functionality has been extended to decreasing order (descending) as well. Note that setkey still requires and will only sort in ascending order, and is not related to setorder. 
  
  By implementing forder to handle decreasing order as well, we now don't have to rely on base:::order anymore. It is now possible to reorder the rows of a data.table based on columns by reference, ex: setorder(x, a, -b, c). Note that, -b also works with columns of type character, unlike base:::order which requires -xtfrm(y) (and is slow) instead.
  
  na.last argument, by default, is FALSE for setorder and setorderv to be consistent with data.table's setkey and is TRUE for x[order(.)] to be consistent with base:::order. Only x[order(.)] can have na.last = NA as it's a subset operation as opposed to setorder or setorderv which reorders the data.table by reference.
  
  Note that if setorder results in reordering of the rows of a keyed data.table, then it's key will be set to NULL.

`See Also`

setkey, setattr, setnames, set, :=, setDT, setDF, copy
html{}

`Examples`

Run this codeset.seed(45L)
DT = data.table(A=sample(3, 10, TRUE), 
         B=sample(letters[1:3], 10, TRUE), C=sample(10))

# setorder
setorder(DT, A, -B)
# same as above but using 'setorderv'
# setorderv(DT, c("A", "B"), c(1,-1))

# setcolorder
setcolorder(DT, c("C", "A", "B"))
Run the code above in your browser using DataLab