import: Read data.frame or matrix from a file

Description

This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format is specified). import supports the following file formats:

Tab-separated data (.tsv), usingread.tablewithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
Comma-separated data (.csv), usingread.csvwithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
Pipe-separated data (.psv), usingread.tablewithsep = '|',row.names = FALSE, andstringsAsFactors = FALSE(or, iffread = TRUE,fread)
Fixed-width format data (.fwf), using a faster version ofread.fwfthat requires awidthsargument and by default in rio hasstringsAsFactors = FALSE
Serialized R objects (.rds), usingreadRDS
Saved R objects (.RData), usingloadfor single-object .Rdata files
JSON (.json), usingfromJSON
Stata (.dta), usingread_dta. Ifhaven = FALSE,read.dtacan be used.
SPSS and SPSS portable (.sav and .por), usingread_savandread_por. Ifhaven = FALSE,read.spsscan be used for .sav files.
"XBASE" database files (.dbf), usingread.dbf
Weka Attribute-Relation File Format (.arff), usingread.arff
R syntax object (.R), usingdget
Excel (.xls and .xlsx), usingread_excel. Ifreadxl = FALSE,read.xlsxcan be used.
SAS (.sas7bdat) and SAS XPORT (.xpt), usingread_sasandread.xport
Minitab (.mtp), usingread.mtp
Epiinfo (.rec), usingread.epiinfo
Systat (.syd), usingread.systat
Data Interchange Format (.dif), usingread.DIF
OpenDocument Spreadsheet (.ods), usingread.ods
Shallow XML documents (.xml), usingxmlToDataFrame. Note: optional arguments not recognized byxmlToDataFrameare passed toxmlParse.
Clipboard import (on Windows and Mac OS), usingread.tablewithrow.names = FALSE
Fortran data (no recognized extension), usingread.fortran

Usage

import(file, format, setclass, ...)

Arguments

file

A character string naming a file, URL, or single-file .zip or .tar archive.

format

An optional character string code of file format, which can be used to override the format inferred from file. Shortcuts include: , (for comma-separated values), ; (for semicolon-separated values), and

setclass

An optional character vector specifying one or more classes to set on the import. By default, all the return object is always a data.frame. Reasonable values for this might be tbl_df (if using dplyr) or data.table

...

Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.

Value

An R data.frame. If setclass is used, this data.frame may have additional class attribute values.

Examples

Run this code

# create CSV to import
export(iris, "iris1.csv")

# specify `format` to override default format
export(iris, "iris.tsv", format = "csv")
stopifnot(identical(import("iris1.csv"), import("iris.tsv", format = "csv")))

# import CSV as a `data.table`
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))

# pass arguments to underlying import function
iris1 <- import("iris1.csv")
identical(names(iris), names(iris1))

export(iris, "iris2.csv", col.names = FALSE)
iris2 <- import("iris2.csv")
identical(names(iris), names(iris2))

# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import("iris1.csv", setclass = "tbl_df"), "tbl_df"))

# cleanup
unlink("iris.tsv")
unlink("iris1.csv")
unlink("iris2.csv")

Run the code above in your browser using DataLab