Learn R Programming

data.table

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.

The data.table project uses a custom governance agreement and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data: ?dcast (pivot/wider/spread) and ?melt (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

install.packages("data.table")

# latest development version (only if newer available)
data.table::update_dev_pkg()

# latest development version (force install)
install.packages("data.table", repos="https://rdatatable.gitlab.io/data.table")

See the Installation wiki for more details.

Usage

Use data.table subset [ operator the same way you would use data.frame one, but...

  • no need to prefix each column with DT$ (like subset() and with() but built-in)
  • any R expression using any package is allowed in j argument, not just list of columns
  • extra argument by to compute j expression by group
library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000

Getting started

Cheatsheets

Community

data.table is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the top most starred R packages on GitHub, and was highly rated by the Depsy project. If you need help, the data.table community is active on StackOverflow.

A list of packages that significantly support, extend, or make use of data.table can be found in the Seal of Approval document.

Stay up-to-date

Contributing

Guidelines for filing issues / pull requests: Contribution Guidelines.

Copy Link

Version

Install

install.packages('data.table')

Monthly Downloads

979,843

Version

1.18.2.1

License

MPL-2.0 | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Tyson S. Barrett

Last Published

January 27th, 2026

Functions in data.table (1.18.2.1)

fdroplevels

Fast droplevels
frev

Fast reverse
fcase

fcase
frank

Fast rank
foverlaps

Fast overlap joins
fread

Fast and friendly file finagler
fifelse

Fast ifelse
fctr

Create a factor retaining original ordering
duplicated

Determine Duplicate Rows
dcast.data.table

Fast dcast for data.table
last

First/last item of an object
like

Convenience function for calling grep
.Last.updated

Number of rows affected by last update
frollapply

Rolling user-defined function
fsort

Fast parallel sort
fwrite

Fast CSV writer
frolladapt

Adapt rolling window to irregularly spaced time series
groupingsets

Grouping Set aggregation for data tables
measure

Specify measure.vars via regex or separator
froll

Rolling functions
mergelist

Merge multiple data.tables
merge

Merge two data.tables
nafill

Fill missing values
na.omit.data.table

Remove rows with missing values on columns specified
melt.data.table

Fast melt for data.table
notin

Convenience operator for checking if an example is not in a set of elements
print.data.table

data.table Printing Options
setDTthreads

Set or get number of threads that data.table should use
patterns

Obtain matching indices corresponding to patterns
rbindlist

Makes one data.table from a list of many
.selfref.ok

Tests self reference of a data.table
rowwiseDT

Create a data.table row-wise
setcolorder

Fast column reordering of a data.table by reference
rowid

Generate unique row ids within each group
rleid

Generate run-length type group id
setDF

Coerce a data.table to data.frame by reference
setDT

Coerce lists and data.frames to data.table by reference
setattr

Set attributes of objects by reference
setNumericRounding

Change or turn off numeric rounding
setkey

Create key on a data.table
tables

Display 'data.table' metadata
shouldPrint

For use by packages that mimic/divert auto printing e.g. IRkernel and knitr
substitute2

Substitute expression
shift

Fast lead/lag for vectors and lists
test

Test assertions for equality, exceptions and console output
split

Split data.table into chunks in a list
subset.data.table

Subsetting data.tables
special-symbols

Special symbols
setops

Set operations for data tables
setorder

Fast row reordering of a data.table by reference
tstrsplit

strsplit and transpose the resulting list efficiently
timetaken

Pretty print of time taken
transpose

Efficient transpose of list
update_dev_pkg

Perform update of development version of a package
test.data.table

Runs a set of tests
transform.data.table

Data table utilities
truelength

Over-allocation access
J

Creates a join data.table
as.data.table.xts

Efficient xts to as.data.table conversion
all.equal

Equality Test Between Two Data Tables
as.xts.data.table

Efficient data.table to xts conversion
data.table-condition-classes

Condition Handling with Classed Conditions
address

Address in RAM of a variable
between

Convenience functions for range subsets
IDateTime

Integer based date class
data.table-class

S4 Definition for data.table
as.data.table

Coerce to data.table
cdt

data.table exported C routines
chmatch

Faster match of character vectors
:=

Assignment by reference
data.table-package

Enhanced data.frame
cbindlist

Column bind multiple data.tables
as.matrix

Convert a data.table to a matrix
copy

Copy an entire object
fcoalesce

Coalescing missing values
data.table-options

Global Options for the data.table Package
datatable.optimize

Optimisations in data.table