import: Import

Description

Read data.frame from a file

Usage

import(file, format, setclass, which, ...)

Arguments

file

A character string naming a file, URL, or single-file .zip or .tar archive.

format

An optional character string code of file format, which can be used to override the format inferred from file. Shortcuts include: , (for comma-separated values), ; (for semicolon-separated values), and

setclass

An optional character vector specifying one or more classes to set on the import. By default, all the return object is always a data.frame. Reasonable values for this might be tbl_df (if using dplyr) or data.table

which

This argument is used to control import from multi-object files. If file is a compressed directory, which can be either a character string specifying a filename or an integer specifying which file (in locale sort order) to extrac

...

Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.

Value

An R data.frame. If setclass is used, this data.frame may have additional class attribute values.

Details

This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format is specified). import supports the following file formats:

Tab-separated data (.tsv), usingread.tablewithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
Comma-separated data (.csv), usingread.csvwithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
http://csvy.org/{CSVY} (CSV with a YAML metadata header) usingread.csvwithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread). Note: the YAML header can berawor commented with hash symbols at the beginnning of each line. The latter is the default format when usingexportto write to CSVY.
Feather R/Python interchange format (.feather), usingread_feather
Pipe-separated data (.psv), usingread.tablewithsep = '|',row.names = FALSE, andstringsAsFactors = FALSE(or, iffread = TRUE,fread)
Fixed-width format data (.fwf), using a faster version ofread.fwfthat requires awidthsargument and by default in rio hasstringsAsFactors = FALSE. Ifreadr = TRUE, import will be performed usingread_fwf, wherewidthsshould be:NULL, a vector of column widths, or the output offwf_empty,fwf_widths, orfwf_positions.
Serialized R objects (.rds), usingreadRDS
Saved R objects (.RData), usingloadfor single-object .Rdata files. Usewhichto specify an object name for multi-object .Rdata files.
JSON (.json), usingfromJSON
YAML (.yml), usingyaml.load
Stata (.dta), usingread_dta. Ifhaven = FALSE,read.dtacan be used.
SPSS and SPSS portable (.sav and .por), usingread_savandread_por. Ifhaven = FALSE,read.spsscan be used for .sav files.
"XBASE" database files (.dbf), usingread.dbf
Weka Attribute-Relation File Format (.arff), usingread.arff
R syntax object (.R), usingdget
Excel (.xls and .xlsx), usingread_excel. Ifreadxl = FALSE,read.xlsxcan be used. Usewhichto specify a sheet number.
SAS (.sas7bdat) and SAS XPORT (.xpt), usingread_sasandread.xport.
Minitab (.mtp), usingread.mtp
Epiinfo (.rec), usingread.epiinfo
Systat (.syd), usingread.systat
Data Interchange Format (.dif), usingread.DIF
OpenDocument Spreadsheet (.ods), usingread.ods. Usewhichto specify a sheet number.
Shallow XML documents (.xml). The data structure will only be read correctly if the XML file can be converted to a list viaas_list.
Single-table HTML documents (.html). The data structure will only be read correctly if the HTML file can be read withread_htmland converted to a list viaas_list.
Clipboard import (on Windows and Mac OS), usingread.tablewithrow.names = FALSE
Fortran data (no recognized extension), usingread.fortran
Google Sheets, as Comma-separated data (.csv)

import attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg is stored in attributes(mtcars$mpg) rather than attributes(mtcars)). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)), see gather_attrs.

Examples

Run this code

# create CSV to import
export(iris, "iris1.csv")

# specify `format` to override default format
export(iris, "iris.tsv", format = "csv")
stopifnot(identical(import("iris1.csv"), import("iris.tsv", format = "csv")))

# import CSV as a `data.table`
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))

# pass arguments to underlying import function
iris1 <- import("iris1.csv")
identical(names(iris), names(iris1))

export(iris, "iris2.csv", col.names = FALSE)
iris2 <- import("iris2.csv")
identical(names(iris), names(iris2))

# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import("iris1.csv", setclass = "tbl_df"), "tbl_df"))

# cleanup
unlink("iris.tsv")
unlink("iris1.csv")
unlink("iris2.csv")