data.table
parlance, all set*
functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other data.table
operator that modifies input by reference is :=
. Check out the See Also
section below for other set*
function data.table
provides. setcolorder
reorders the columns of data.table, by reference, to the new order provided.
setorder
(and setorderv
) reorders the rows of a data.table
by reference, based on the columns provided. It can sort in both ascending and descending order. The functionality is identical to using ?order
on a data.frame
, except much faster, very memory efficient and much more user-friendly.
x[order(.)]
is now optimised internally to use data.table
's fast order by default. data.table
by default always sorts in C-locale. If instead, it is essential to sort by the session locale, one could always revert back to base's order
by doing: x[base:::order(.)]
.
bit64:::integer64
type is also supported for reordering rows of a data.table
.
setcolorder(x, neworder)
setorder(x, ..., na.last=FALSE)
setorderv(x, cols, order=1L, na.last=FALSE)
# optimised to use data.table's internal fast order
# x[order(., na.last=TRUE)]
data.table
....
is missing (ex: setorder(x)
), x
is rearranged based on all columns in ascending order by default. To sort by a column in descending order prefix a "-"
x
, to which to order by. Do not add "-"
here.1
and -1
, corresponding to ascending and descending order. The length of order
must be either 1
or equal to that of cols
. If length(order
TRUE
, missing values in the data are placed last; if FALSE
, they are placed first; if NA
they are removed. na.last=NA
is valid only for x[order(., na.last)]
and it's default is <set*
functions, the input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]
. If you require a copy, take a copy first (using DT2 = copy(DT)
). See ?copy
.data.table
, the idiomatic way is to use setcolorder(x, neworder)
, instead of doing x <- x[, neworder, with=FALSE]
. This is because the latter makes an entire copy of the data.table
, which maybe unnecessary in most situations. setcolorder
also allows column numbers instead of column names for neworder
argument, although it isn't good programming practice to use column numbers. We recommend using column names.
data.table
internally implements extremely fast radix based ordering. However, in versions <= 1.9.2,="" fast="" ordering="" was="" only="" capable="" of="" increasing="" order="" (ascending).="" in="" versions="">1.9.2, the functionality has been extended to decreasing order (descending) as well. Note that setkey
still requires and will only sort in ascending order, and is not related to setorder
.
By implementing forder
to handle decreasing order as well, we now don't have to rely on base:::order
anymore. It is now possible to reorder the rows of a data.table
based on columns by reference, ex: setorder(x, a, -b, c)
. Note that, -b
also works with columns of type character
, unlike base:::order
which requires -xtfrm(y)
(and is slow) instead.
na.last
argument, by default, is FALSE
for setorder
and setorderv
to be consistent with data.table
's setkey
and is TRUE
for x[order(.)]
to be consistent with base:::order
. Only x[order(.)]
can have na.last = NA
as it's a subset operation as opposed to setorder
or setorderv
which reorders the data.table by reference.
Note that if setorder
results in reordering of the rows of a keyed data.table
, then it's key will be set to NULL
.=>setkey
, setattr
, setnames
, set
, :=
, setDT
, setDF
, copy
set.seed(45L)
DT = data.table(A=sample(3, 10, TRUE),
B=sample(letters[1:3], 10, TRUE), C=sample(10))
# setorder
setorder(DT, A, -B)
# same as above but using 'setorderv'
# setorderv(DT, c("A", "B"), c(1,-1))
# setcolorder
setcolorder(DT, c("C", "A", "B"))
Run the code above in your browser using DataCamp Workspace