When adding columns by reference using :=
, we could simply create a new column list vector (one longer) and memcpy over the old vector, with no copy of the column vectors themselves. That requires negligibe use of space and time, and is what v1.7.2 did. However, that copy of the list vector of column pointers only (but not the columns themselves), a shallow copy, resulted in inconsistent behaviour in some circumstances. So, as from v1.7.3 data.table over allocates the list vector of column pointers so that columns can be added fully by reference, consistently. When the allocated column pointer slots are used up, to add a new column data.table
must reallocate that vector, with a warning if two or more variables are bound to the same data.table (since the shallow copy can result in unexpected behaviour). To avoid this warning, there are several options: use copy
, use alloc.col
to reallocate in advance, wrap with suppressWarnings
to indicate you anticipated the warning, or, change the default allocation rule (perhaps in your .Rprofile); e.g., options(datatable.alloc.col=quote(1000))
.
Please note : over allocation of the list vector of column pointers is not for efficiency, it's for consistency.