These functions are experimental and somewhat advanced. By experimental we mean their names might change and perhaps the syntax, argument names and types (so if you write a lot of code using them, you have been warned!). They should work and be stable, though, so please report problems with them.
truelength(x) alloc.col(DT, n=getOption("datatable.alloccol",quote(max(100,2*ncol(DT)))))
- Any type of vector, including
data.tablewhich is a
listvector of column pointers.
- The number of column pointer slots to reserve in memory, including existing columns. May be a numeric, or a quote()-ed expression (see default). If
DTis a 10 column
n=1000means grow the spare slots from
When adding columns by reference using
:=, we could simply create a new column list vector (one longer) and memcpy over the old vector, with no copy of the column vectors themselves. That requires negligibe use of space and time, and is what v1.7.2 did. However, that copy of the list vector of column pointers only (but not the columns themselves), a shallow copy, resulted in inconsistent behaviour in some circumstances. So, as from v1.7.3 data.table over allocates the list vector of column pointers so that columns can be added fully by reference, consistently.
When the allocated column pointer slots are used up, to add a new column
data.table must reallocate that vector. If two or more variables are bound to the same data.table this shallow copy may or may not be desirable, but we don't think this will be a problem very often (more discussion may be required on datatable-help). To be warned whenever this happens:
options(datatable.allocwarn=TRUE). If warnings are on, to avoid them, there are several options: use
copy to make a deep copy, use
alloc.col to reallocate in advance, wrap with
suppressWarnings to indicate you anticipated the warning, or, change the default allocation rule (perhaps in your .Rprofile); e.g.,
Please note : over allocation of the column pointer vector is not for efficiency per se, it's so that
:= can add columns by reference without a shallow copy.
truelength(x)returns the length of the vector allocated in memory.
length(x)of those items are in use. Currently, it's just the list vector of column pointers that is over-allocated (i.e.
truelength(DT)), not the column vectors themselves, which would in future allow fast row
insert(). For tables loaded from disk however,
truelengthis 0 in R2.14.0 and random in R<= 2.13.2;="" i.e.,="" in="" both="" cases="" perhaps="" unexpected.=""
data.table detects this state and over-allocates the loaded
data.tablewhen the next column addition or deletion occurs. All other operations on
data.table(such as fast grouping and joins) do not need
DTby reference. This may be useful for efficiency if you know you are about to going to add a lot of columns in a loop. It also returns the new DT, for convenience in compound queries.=>
DT = data.table(a=1:3,b=4:6) length(DT) # 2 column pointer slots used truelength(DT) # 100 column pointer slots allocated alloc.col(DT,200) length(DT) # 2 used truelength(DT) # 200 allocated DT[,c:=7L] # add new column truelength(DT)-length(DT) # 197 slots spare