Learn R Programming

rio (version 0.4.6)

import: Import

Description

Read data.frame from a file

Usage

import(file, format, setclass, which, ...)

Arguments

file
A character string naming a file, URL, or single-file .zip or .tar archive.
format
An optional character string code of file format, which can be used to override the format inferred from file. Shortcuts include: , (for comma-separated values), ; (for semicolon-separated values), and
setclass
An optional character vector specifying one or more classes to set on the import. By default, all the return object is always a data.frame. Reasonable values for this might be tbl_df (if using dplyr) or data.table
which
This argument is used to control import from multi-object files. If file is a compressed directory, which can be either a character string specifying a filename or an integer specifying which file (in locale sort order) to extrac
...
Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below.

Value

  • An R data.frame. If setclass is used, this data.frame may have additional class attribute values.

Details

This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format is specified). import supports the following file formats:
  • Tab-separated data (.tsv), usingread.tablewithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
  • Comma-separated data (.csv), usingread.csvwithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread)
  • http://csvy.org/{CSVY} (CSV with a YAML metadata header) usingread.csvwithrow.names = FALSEandstringsAsFactors = FALSE(or, iffread = TRUE,fread). Note: the YAML header can berawor commented with hash symbols at the beginnning of each line. The latter is the default format when usingexportto write to CSVY.
  • Feather R/Python interchange format (.feather), usingread_feather
  • Pipe-separated data (.psv), usingread.tablewithsep = '|',row.names = FALSE, andstringsAsFactors = FALSE(or, iffread = TRUE,fread)
  • Fixed-width format data (.fwf), using a faster version ofread.fwfthat requires awidthsargument and by default in rio hasstringsAsFactors = FALSE. Ifreadr = TRUE, import will be performed usingread_fwf, wherewidthsshould be:NULL, a vector of column widths, or the output offwf_empty,fwf_widths, orfwf_positions.
  • Serialized R objects (.rds), usingreadRDS
  • Saved R objects (.RData), usingloadfor single-object .Rdata files. Usewhichto specify an object name for multi-object .Rdata files.
  • JSON (.json), usingfromJSON
  • YAML (.yml), usingyaml.load
  • Stata (.dta), usingread_dta. Ifhaven = FALSE,read.dtacan be used.
  • SPSS and SPSS portable (.sav and .por), usingread_savandread_por. Ifhaven = FALSE,read.spsscan be used for .sav files.
  • "XBASE" database files (.dbf), usingread.dbf
  • Weka Attribute-Relation File Format (.arff), usingread.arff
  • R syntax object (.R), usingdget
  • Excel (.xls and .xlsx), usingread_excel. Ifreadxl = FALSE,read.xlsxcan be used. Usewhichto specify a sheet number.
  • SAS (.sas7bdat) and SAS XPORT (.xpt), usingread_sasandread.xport.
  • Minitab (.mtp), usingread.mtp
  • Epiinfo (.rec), usingread.epiinfo
  • Systat (.syd), usingread.systat
  • Data Interchange Format (.dif), usingread.DIF
  • OpenDocument Spreadsheet (.ods), usingread.ods. Usewhichto specify a sheet number.
  • Shallow XML documents (.xml). The data structure will only be read correctly if the XML file can be converted to a list viaas_list.
  • Single-table HTML documents (.html). The data structure will only be read correctly if the HTML file can be read withread_htmland converted to a list viaas_list.
  • Clipboard import (on Windows and Mac OS), usingread.tablewithrow.names = FALSE
  • Fortran data (no recognized extension), usingread.fortran
  • Google Sheets, as Comma-separated data (.csv)
import attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg is stored in attributes(mtcars$mpg) rather than attributes(mtcars)). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)), see gather_attrs.

See Also

.import, gather_attrs

Examples

Run this code
# create CSV to import
export(iris, "iris1.csv")

# specify `format` to override default format
export(iris, "iris.tsv", format = "csv")
stopifnot(identical(import("iris1.csv"), import("iris.tsv", format = "csv")))

# import CSV as a `data.table`
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))
stopifnot(inherits(import("iris1.csv", setclass = "data.table"), "data.table"))

# pass arguments to underlying import function
iris1 <- import("iris1.csv")
identical(names(iris), names(iris1))

export(iris, "iris2.csv", col.names = FALSE)
iris2 <- import("iris2.csv")
identical(names(iris), names(iris2))

# set class for the response data.frame as "tbl_df" (from dplyr)
stopifnot(inherits(import("iris1.csv", setclass = "tbl_df"), "tbl_df"))

# cleanup
unlink("iris.tsv")
unlink("iris1.csv")
unlink("iris2.csv")

Run the code above in your browser using DataLab