double is also implemented. This gives a speed-up of about 5-8x compared to 1.8.10
on setkey
and all internal order
/sort
operations. Fast radix sorting is also implemented for character
and bit64::integer64
types. The sort is stable; i.e., the order of ties (if any) is preserved, in both versions - <=1.8.10< code=""> and >= 1.9.0
.=1.8.10<>
In data.table
versions <= 1.8.10<="" code="">, for columns of type integer
, the sort is attempted with the very fast "radix"
method in sort.list
. If that fails, the sort reverts to the default method in order
. For character vectors, data.table
takes advantage of R's internal global string cache and implements a very efficient order, also exported as chorder
.=>
In v1.7.8, the key<-
syntax was deprecated. The <-
method copies the whole table and we know of no way to avoid that copy without a change in Ritself. Please use the set
* functions instead, which make no copy at all. setkey
accepts unquoted column names for convenience, whilst setkeyv
accepts one vector of column names.
The problem (for data.table
) with the copy by key<-
(other than being slower) is that Rdoesn't maintain the over allocated truelength, but it looks as though it has. Adding a column by reference using :=
after a key<-
was therefore a memory overwrite and eventually a segfault; the over allocated memory wasn't really there after key<-
's copy. data.table
s now have an attribute .internal.selfref
to catch and warn about such copies. This attribute has been implemented in a way that is friendly with identical()
and object.size()
.
For the same reason, please use the other set*
functions which modify objects by reference, rather than using the <-
operator which results in copying the entire object.
It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey
and setkeyv
only accept column names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey
by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..."
is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2])
.
=>