import: Import

Description

Read data.frame from a file

Usage

import(file, format, setclass, which, ...)

Arguments

file

A character string naming a file, URL, or single-file .zip or .tar archive.

format

An optional character string code of file format, which can be used to override the format inferred from file. Shortcuts include: “,” (for comma-separated values), “;” (for semicolon-separated values), and “|” (for pipe-separated values).

setclass

An optional character vector specifying one or more classes to set on the import. By default, all the return object is always a “data.frame”. Reasonable values for this might be “tbl_df” (if using dplyr) or “data.table” (if using data.table). Warnings will be produced if a class is used from a package that is not loaded and/or available.

which

This argument is used to control import from multi-object files. If file is a compressed directory, which can be either a character string specifying a filename or an integer specifying which file (in locale sort order) to extract from the compressed directory. For Excel spreadsheets, this can be used to specify a sheet number. For .Rdata files, this can be an object name. Ignored otherwise. A character string value will be used as a regular expression, such that the extracted file is the first match of the regular expression against the file names in the archive.

...

Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.

Value

setclass is used, this data.frame may have additional class attribute values.

Details

This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format is specified).

import supports the following file formats:

Tab-separated data (.tsv), using read.table with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
Comma-separated data (.csv), using read.csv with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
CSVY (CSV with a YAML metadata header) using read.csv with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread). Note: the YAML header can be “raw” or commented with hash symbols at the beginnning of each line. The latter is the default format when using export to write to CSVY.
Feather R/Python interchange format (.feather), using read_feather
Pipe-separated data (.psv), using read.table with sep = '|', row.names = FALSE, and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
Fixed-width format data (.fwf), using a faster version of read.fwf that requires a widths argument and by default in rio has stringsAsFactors = FALSE. If readr = TRUE, import will be performed using read_fwf, where widths should be: NULL, a vector of column widths, or the output of fwf_empty, fwf_widths, or fwf_positions.
Serialized R objects (.rds), using readRDS
Saved R objects (.RData), using load for single-object .Rdata files. Use which to specify an object name for multi-object .Rdata files.
JSON (.json), using fromJSON
YAML (.yml), using yaml.load
Stata (.dta), using read_dta. If haven = FALSE, read.dta can be used.
SPSS (.sav), using read_sav. If haven = FALSE, read.spss can be used.
"XBASE" database files (.dbf), using read.dbf
Weka Attribute-Relation File Format (.arff), using read.arff
R syntax object (.R), using dget
Excel (.xls and .xlsx), using read_excel. If readxl = FALSE, read.xlsx can be used. Use which to specify a sheet number.
SAS (.sas7bdat) and SAS XPORT (.xpt), using read_sas and read.xport.
Minitab (.mtp), using read.mtp
Epiinfo (.rec), using read.epiinfo
Systat (.syd), using read.systat
Data Interchange Format (.dif), using read.DIF
OpenDocument Spreadsheet (.ods), using read.ods. Use which to specify a sheet number.
Shallow XML documents (.xml). The data structure will only be read correctly if the XML file can be converted to a list via as_list.
Single-table HTML documents (.html). The data structure will only be read correctly if the HTML file can be read with read_html and converted to a list via as_list.
Clipboard import (on Windows and Mac OS), using read.table with row.names = FALSE
Fortran data (no recognized extension), using read.fortran
Google Sheets, as Comma-separated data (.csv)

import attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg is stored in attributes(mtcars$mpg) rather than attributes(mtcars)). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)), see gather_attrs.

Examples

Run this code

# create CSV to import
export(iris, "iris1.csv")

# specify `format` to override default format
export(iris, "iris.tsv", format = "csv")
stopifnot(identical(import("iris1.csv"), import("iris.tsv", format = "csv")))

# import CSV as a `data.table`
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))

# pass arguments to underlying import function
iris1 <- import("iris1.csv")
identical(names(iris), names(iris1))

export(iris, "iris2.csv", col.names = FALSE)
iris2 <- import("iris2.csv")
identical(names(iris), names(iris2))

# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import("iris1.csv", setclass = "tbl_df"), "tbl_df"))

# cleanup
unlink("iris.tsv")
unlink("iris1.csv")
unlink("iris2.csv")