`data.table`

`data.frame`

. It offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by `A[B]`

syntax in Rwhere `A`

is a matrix and `B`

is a 2-column matrix. Since a `data.table`

`data.frame`

, it is compatible with Rfunctions and packages that `data.frame`

.
The 10 minute quick start guide to `data.table`

may be a good place to start: `vignette("datatable-intro")`

}. Or, the first section of FAQs is intended to be read from start to finish and is considered core documentation: `vignette("datatable-faq")`

}. If you have read and searched these documents and the help page below, please feel free to ask questions on `bug.report(package="data.table")`

.
Please check the `example(data.table)`

and study the output at the prompt.
*NEW* : help page for `:=`

`keyby`

argument`data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)`## S3 method for class 'data.table':
[(x, i, j, by, keyby, with=TRUE,
nomatch = getOption("datatable.nomatch"), # default: NA_integer_
mult = "all", roll = FALSE, rolltolast = FALSE,
which = FALSE, .SDcols,
verbose=getOption("datatable.verbose"), # default: FALSE
drop=NULL)

...

Just as

`...`

in `data.frame`

. Usual recycling rules are applied to vectors of different lengths to create a list of equal length vectors.keep.rownames

If

`...`

is a `matrix`

or `data.frame`

, `TRUE`

will retain the rownames of that object in a column named `rn`

.check.names

Just as

`check.names`

in `data.frame`

.key

Character vector of one or more column names which is passed to

`setkey`

. It may be a single comma separated string such as `key="x,y,z"`

, or a vector of names such as `key=c("x","y","z")`

x

A

`data.table`

.i

Integer, logical or character vector, expression of column names,

`list`

or `data.table`

. integer and logical vectors work the same way they do in `[.data.frame`

. Other than

j

A single column name, single expresson of column names,

`list()`

of expressions of column names, an expression or function call that evaluates to `list`

(including `data.frame`

and `data.table`

which are `l`

by

A single unquoted column name,

`list()`

of expressions of column names, or a single character string containing comma separated column names, or a character vector of column names. The `list()`

of expressions is evaluated within t

keyby

An *ad hoc by* just as *keyed by* as defined above.

`by`

but with an additional `setkey()`

on the `by`

columns of the result, for convenience. Not to be confused with a with

By default

`with=TRUE`

and `j`

is evaluated within the frame of `x`

. The column names can be used as variables. When `with=FALSE`

, `j`

works as it does in `[.data.frame`

.nomatch

Same as

`nomatch`

in `match`

. When a row in `i`

has no match to `x`

's key, `nomatch=NA`

(default) means `NA`

is returned for `x`

's non-join columult

When *multiple* rows in

`x`

match to the row in `i`

, `mult`

controls which are returned: `"all"`

(default), `"first"`

or `"last"`

.roll

Applies to the last join column, generally a date but can be any ordered variable, irregular and including gaps. If

`roll=TRUE`

and `i`

's row matches to all but the last `x`

join column, and its value in the last `i`

`rolltolast`

`Like ``roll`

but the data is not rolled forward past the *last* observation. The value of `i`

must fall in a gap in `x`

but not after the end of the data for that group defined by all but the last join column.

`which`

`TRUE`

returns the integer row numbers of `x`

that `i`

matches to.

`.SDcols`

`Advanced. Specifies the columns of ``x`

included in `.SD`

. May be character column names or numeric positions. This is useful for speed when applying a function through a subset of (possible very many) columns; e.g., `DT[,lapply(`

`verbose`

`TRUE`

turns on status and information messages to the console. Turn this on by default using `options(datatable.verbose=TRUE)`

. The quantity and types of verbosity may be expanded in future.

`drop`

`Never used by ``data.table`

. Do not use. It needs to be here because `data.table`

inherits from `data.frame`

. See `vignette("datatable-faq")`

.

`Details`

`data.table`

builds on base Rfunctionality to reduce 2 types of time :
- programming time (easier to write, read, debug and maintain)
- compute time

It combines database like operations such as `subset`

, `with`

and `by`

and provides similar joins that `merge`

provides but faster. This is achieved by using R's column based ordered in-memory `data.frame`

structure, `eval`

within the environment of a `list`

, the `[.data.table`

mechanism to condense the features, and compiled C to make certain operations fast.

The package can be used just for rapid programming (compact syntax). Largest compute time benefits are on 64bit platforms with plentiful RAM, or when smaller datasets are repeatedly queried within a loop, or when other methods use so much working memory that they fail with an out of memory error.

As with `[.data.frame`

, *compound queries* can be concatenated on one line; e.g.,
DT[,sum(v),by=colA][V1<300][tail(order(v1))] 6="" 300="" #="" sum(v)="" by="" cola="" then="" return="" the="" largest="" which="" are="" under="" ```
j expression does not have to return data; e.g.,
DT[,plot(colB,colC),by=colA]
# produce a set of plots (likely to pdf) returning no data
Multiple
````data.table`

s (e.g. `X`

, `Y`

and `Z`

) can be joined in many ways; e.g.,
X[Y][Z]
X[Z][Y]
X[Y[Z]]
X[Z[Y]]
A `data.table`

is a `list`

of vectors, just like a `data.frame`

. However :

- it never has rownames. Instead it may have one
*key*of one or more columns. This key can be used for row indexing instead of rownames. - it has enhanced functionality in
`[.data.table`

for fast joins of keyed tables, fast aggregation, and fast last observation carried forward (LOCF).

Since a `list`

*is* a `vector`

, `data.table`

columns may be type `list`

. Columns of type `list`

can contain mixed types. Each item in a column of type `list`

may be different lengths. This is true of `data.frame`

, too.

Several *methods* are provided for `data.table`

, including `is.na`

, `na.omit`

,
`t`

, `rbind`

, `cbind`

, `merge`

and others.

`References`

`data.table`

homepage: http://datatable.r-forge.r-project.org/
User reviews: http://crantastic.org/packages/data-table
http://en.wikipedia.org/wiki/Binary_search
http://en.wikipedia.org/wiki/Radix_sort

`See Also`

`data.frame`

, `[.data.frame`

, `as.data.table`

, `setkey`

, `J`

, `SJ`

, `CJ`

, `merge.data.table`

, `tables`

, `test.data.table`

, `IDateTime`

, `unique.data.table`

, `copy`

, `:=`

, `alloc.col`

, `truelength`

html { }

`Examples`

```
example(data.table) # to run these examples at the prompt
DF = data.frame(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
DF
DT
identical(dim(DT),dim(DF)) # TRUE
identical(DF$a, DT$a) # TRUE
is.list(DF) # TRUE
is.list(DT) # TRUE
is.data.frame(DT) # TRUE
tables()
DT[2] # 2nd row
DT[,v] # v column (as vector)
DT[,list(v)] # v column (as data.table)
DT[2:3,sum(v)] # sum(v) over rows 2 and 3
DT[2:5,cat(v,"")] # just for j's side effect
DT[c(FALSE,TRUE)] # even rows (usual recycling)
DT[,2,with=FALSE] # 2nd column
colNum = 2
DT[,colNum,with=FALSE] # same
setkey(DT,x) # set a 1-column key. No quotes, for convenience.
setkeyv(DT,"x") # same (v in setkeyv stands for vector)
v="x"
setkeyv(DT,v) # same
# key(DT)<-"x" # copies whole table, please use set* functions instead
DT["a"] # binary search (fast)
DT[x=="a"] # vector scan (slow)
DT[,sum(v),by=x] # keyed by
DT[,sum(v),by=key(DT)] # same
DT[,sum(v),by=y] # ad hoc by
DT["a",sum(v)] # j for one group
DT[c("a","b"),sum(v)] # j for two groups
X = data.table(c("b","c"),foo=c(4,2))
X
DT[X] # join
DT[X,sum(v)] # join and eval j for each row in i
DT[X,mult="first"] # first row of each group
DT[X,mult="last"] # last row of each group
DT[X,sum(v)*foo] # join inherited scope
J("a",2) # J() is alias for data.table()
data.table("a",2) # same
setkey(DT,x,y) # 2-column key
setkeyv(DT,c("x","y")) # same
DT["a"] # join to 1st column of key
DT[J("a")] # same
DT[J("a",3)] # join to 2 columns
DT[J("a",3:6)] # join 4 rows (2 missing)
DT[J("a",3:6),nomatch=0] # remove missing
DT[J("a",3:6),roll=TRUE] # rolling join (locf)
DT[,sum(v),by=list(y%%2)] # by expression
DT[,.SD[2],by=x] # 2nd row of each group
DT[,tail(.SD,2),by=x] # last 2 rows of each group
DT[,lapply(.SD,sum),by=x] # applying through columns by group
DT[,list(MySum=sum(v),
MyMin=min(v),
MyMax=max(v)),
by=list(x,y%%2)] # by 2 expressions
DT[,sum(v),x][V1<20] # compound query
DT[,sum(v),x][order(-V1)] # ordering results
DT[,z:=42L] # add new column by reference
DT[,z:=NULL] # remove column
DT["a",v:=42L] # subassign v by reference
DT[,transform(.SD,m=mean(v)),by=x]
DT[,.SD[which.min(v)],by=x]
# Follow posting guide, support is here (not r-help) :
maintainer("data.table")
vignette("datatable-intro")
vignette("datatable-faq")
vignette("datatable-timings")
test.data.table() # over 300 low level tests
update.packages() # keep up to date
```