import: Read data.frame or matrix from a file

Description

This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format is specified).

import supports the following file formats:

Tab-separated data (.tsv), using read.table with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
Comma-separated data (.csv), using read.csv with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
CSVY (CSV with a YAML metadata header) using read.csv with row.names = FALSE and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
Pipe-separated data (.psv), using read.table with sep = '|', row.names = FALSE, and stringsAsFactors = FALSE (or, if fread = TRUE, fread)
Fixed-width format data (.fwf), using a faster version of read.fwf that requires a widths argument and by default in rio has stringsAsFactors = FALSE
Serialized R objects (.rds), using readRDS
Saved R objects (.RData), using load for single-object .Rdata files
JSON (.json), using fromJSON
Stata (.dta), using read_dta. If haven = FALSE, read.dta can be used. If haven = TRUE metadata will be stored as object-level attributes by default; if column.labels = TRUE metadata will be stored as column-level attributes.
SPSS and SPSS portable (.sav and .por), using read_sav and read_por. If haven = FALSE, read.spss can be used for .sav files. If haven = TRUE metadata will be stored as object-level attributes by default; if column.labels = TRUE metadata will be stored as column-level attributes.
"XBASE" database files (.dbf), using read.dbf
Weka Attribute-Relation File Format (.arff), using read.arff
R syntax object (.R), using dget
Excel (.xls and .xlsx), using read_excel. If readxl = FALSE, read.xlsx can be used.
SAS (.sas7bdat) and SAS XPORT (.xpt), using read_sas and read.xport. If haven = TRUE metadata will be stored as object-level attributes by default; if column.labels = TRUE metadata will be stored as column-level attributes.
Minitab (.mtp), using read.mtp
Epiinfo (.rec), using read.epiinfo
Systat (.syd), using read.systat
Data Interchange Format (.dif), using read.DIF
OpenDocument Spreadsheet (.ods), using read.ods
Shallow XML documents (.xml), using xmlToDataFrame. Note: optional arguments not recognized by xmlToDataFrame are passed to xmlParse.
Clipboard import (on Windows and Mac OS), using read.table with row.names = FALSE
Fortran data (no recognized extension), using read.fortran
Google Sheets, as Comma-separated data (.csv)

Usage

import(file, format, setclass, ...)

Arguments

file

A character string naming a file, URL, or single-file .zip or .tar archive.

format

An optional character string code of file format, which can be used to override the format inferred from file. Shortcuts include: “,” (for comma-separated values), “;” (for semicolon-separated values), and “|” (for pipe-separated values).

setclass

An optional character vector specifying one or more classes to set on the import. By default, all the return object is always a “data.frame”. Reasonable values for this might be “tbl_df” (if using dplyr) or “data.table” (if using data.table). Warnings will be produced if a class is used from a package that is not loaded and/or available.

...

Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.

Value

setclass is used, this data.frame may have additional class attribute values.

Examples

Run this code

# create CSV to import
export(iris, "iris1.csv")

# specify `format` to override default format
export(iris, "iris.tsv", format = "csv")
stopifnot(identical(import("iris1.csv"), import("iris.tsv", format = "csv")))

# import CSV as a `data.table`
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))

# pass arguments to underlying import function
iris1 <- import("iris1.csv")
identical(names(iris), names(iris1))

export(iris, "iris2.csv", col.names = FALSE)
iris2 <- import("iris2.csv")
identical(names(iris), names(iris2))

# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import("iris1.csv", setclass = "tbl_df"), "tbl_df"))

# cleanup
unlink("iris.tsv")
unlink("iris1.csv")
unlink("iris2.csv")

Run the code above in your browser using DataLab