
Last chance! 50% off unlimited learning
Sale ends in
Read in a data.frame from a file. Exceptions to this rule are Rdata, RDS, and JSON input file formats, which return the originally saved object without changing its class.
import(file, format, setclass, which, ...)
A data frame. If setclass
is used, this data frame may have additional class attribute values, such as “tibble” or “data.table”.
A character string naming a file, URL, or single-file .zip or .tar archive.
An optional character string code of file format, which can be used to override the format inferred from file
. Shortcuts include: “,” (for comma-separated values), “;” (for semicolon-separated values), and “|” (for pipe-separated values).
An optional character vector specifying one or more classes to set on the import. By default, the return object is always a “data.frame”. Allowed values include “tbl_df”, “tbl”, or “tibble” (if using dplyr) or “data.table” (if using data.table). Other values are ignored, such that a data.frame is returned.
This argument is used to control import from multi-object files; as a rule import
only ever returns a single data frame (use import_list
to import multiple data frames from a multi-object file). If file
is a compressed directory, which
can be either a character string specifying a filename or an integer specifying which file (in locale sort order) to extract from the compressed directory. For Excel spreadsheets, this can be used to specify a sheet name or number. For .Rdata files, this can be an object name. For HTML files, it identifies which table to extract (from document order). Ignored otherwise. A character string value will be used as a regular expression, such that the extracted file is the first match of the regular expression against the file names in the archive.
Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.
This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format
is specified).
import
supports the following file formats:
Comma-separated data (.csv), using fread
or, if fread = FALSE
, read.table
with row.names = FALSE
and stringsAsFactors = FALSE
Pipe-separated data (.psv), using fread
or, if fread = FALSE
, read.table
with sep = '|'
, row.names = FALSE
and stringsAsFactors = FALSE
Tab-separated data (.tsv), using fread
or, if fread = FALSE
, read.table
with row.names = FALSE
and stringsAsFactors = FALSE
SAS (.sas7bdat), using read_sas
.
SAS XPORT (.xpt), using read_xpt
or, if haven = FALSE
, read.xport
.
SPSS (.sav), using read_sav
. If haven = FALSE
, read.spss
can be used.
SPSS compressed (.zsav), using read_sav
.
Stata (.dta), using read_dta
. If haven = FALSE
, read.dta
can be used.
SPSS Portable Files (.por), using read_por
.
Excel (.xls and .xlsx), using read_excel
. Use which
to specify a sheet number. For .xlsx files, it is possible to set readxl = FALSE
, so that read.xlsx
can be used instead of readxl (the default).
R syntax object (.R), using dget
Saved R objects (.RData,.rda), using load
for single-object .Rdata files. Use which
to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame).
Serialized R objects (.rds), using readRDS
. This can be any R object (not just a data frame).
Epiinfo (.rec), using read.epiinfo
Minitab (.mtp), using read.mtp
Systat (.syd), using read.systat
"XBASE" database files (.dbf), using read.dbf
Weka Attribute-Relation File Format (.arff), using read.arff
Data Interchange Format (.dif), using read.DIF
Fortran data (no recognized extension), using read.fortran
Fixed-width format data (.fwf), using a faster version of read.fwf
that requires a widths
argument and by default in rio has stringsAsFactors = FALSE
. If readr = TRUE
, import will be performed using read_fwf
, where widths
should be: NULL
, a vector of column widths, or the output of fwf_empty
, fwf_widths
, or fwf_positions
.
gzip comma-separated data (.csv.gz), using read.table
with row.names = FALSE
and stringsAsFactors = FALSE
Apache Arrow Parquet (.parquet), using read_parquet
Feather R/Python interchange format (.feather), using read_feather
Fast storage (.fst), using read.fst
JSON (.json), using fromJSON
Matlab (.mat), using read.mat
EViews (.wf1), using readEViews
OpenDocument Spreadsheet (.ods), using read_ods
. Use which
to specify a sheet number.
Single-table HTML documents (.html), using read_html
. The data structure will only be read correctly if the HTML file can be converted to a list via as_list
.
Shallow XML documents (.xml), using read_xml
. The data structure will only be read correctly if the XML file can be converted to a list via as_list
.
YAML (.yml), using yaml.load
Clipboard import, using read.table
with row.names = FALSE
Google Sheets, as Comma-separated data (.csv)
GraphPad Prism (.pzfx) using read_pzfx
import
attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg
is stored in attributes(mtcars$mpg)
rather than attributes(mtcars)
). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)
), see gather_attrs
.
After importing metadata-rich file formats (e.g., from Stata or SPSS), it may be helpful to recode labelled variables to character or factor using characterize
or factorize
respectively.
import_list
, .import
, characterize
, gather_attrs
, export
, convert
# create CSV to import
export(iris, csv_file <- tempfile(fileext = ".csv"))
# specify `format` to override default format
export(iris, tsv_file <- tempfile(fileext = ".tsv"), format = "csv")
stopifnot(identical(import(csv_file), import(tsv_file, format = "csv")))
# import CSV as a `data.table`
stopifnot(inherits(import(csv_file, setclass = "data.table"), "data.table"))
# pass arguments to underlying import function
iris1 <- import(csv_file)
identical(names(iris), names(iris1))
export(iris, csv_file2 <- tempfile(fileext = ".csv"), col.names = FALSE)
iris2 <- import(csv_file2)
identical(names(iris), names(iris2))
# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import(csv_file, setclass = "tbl_df"), "tbl_df"))
# non-data frame formats supported for RDS, Rdata, and JSON
export(list(mtcars, iris), rds_file <- tempfile(fileext = ".rds"))
li <- import(rds_file)
identical(names(mtcars), names(li[[1]]))
# cleanup
unlink(csv_file)
unlink(csv_file2)
unlink(tsv_file)
unlink(rds_file)
Run the code above in your browser using DataLab