read.DIF
Data Input from Spreadsheet
Reads a file in Data Interchange Format (DIF) and creates a data frame from it. DIF is a format for data matrices such as single spreadsheets.
- Keywords
- file, connection
Usage
read.DIF(file, header = FALSE,
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, blank.lines.skip = TRUE,
stringsAsFactors = default.stringsAsFactors(),
transpose = FALSE, fileEncoding = "")
Arguments
- file
the name of the file which the data are to be read from, or a connection, or a complete URL.
The name
"clipboard"
may also be used on Windows, in which caseread.DIF("clipboard")
will look for a DIF format entry in the Windows clipboard.- header
a logical value indicating whether the spreadsheet contains the names of the variables as its first line. If missing, the value is determined from the file format:
header
is set toTRUE
if and only if the first row contains only character values and the top left cell is empty.- dec
the character used in the file for decimal points.
- numerals
string indicating how to convert numbers whose conversion to double precision would lose accuracy, see
type.convert
.- row.names
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if
row.names
is missing, the rows are numbered.Using
row.names = NULL
forces row numbering.- col.names
a vector of optional names for the variables. The default is to use
"V"
followed by the column number.- as.is
the default behavior of
read.DIF
is to convert character variables to factors. The variableas.is
controls the conversion of columns not otherwise specified bycolClasses
. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors.Note: In releases prior to R 2.12.1, cells marked as being of character type were converted to logical, numeric or complex using
type.convert
as inread.table
.Note: to suppress all conversions including those of numeric columns, set
colClasses = "character"
.Note that
as.is
is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.- na.strings
a character vector of strings which are to be interpreted as
NA
values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.- colClasses
character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be
NA
.Possible values are
NA
(whentype.convert
is used),"NULL"
(when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or"factor"
,"Date"
or"POSIXct"
. Otherwise there needs to be anas
method (from package methods) for conversion from"character"
to the specified formal class.Note that
colClasses
is specified per column (not per variable) and so includes the column of row names (if any).- nrows
the maximum number of rows to read in. Negative values are ignored.
- skip
the number of lines of the data file to skip before beginning to read data.
- check.names
logical. If
TRUE
then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (bymake.names
) so that they are, and also to ensure that there are no duplicates.- blank.lines.skip
logical: if
TRUE
blank lines in the input are ignored.- stringsAsFactors
logical: should character vectors be converted to factors?
- transpose
logical, indicating if the row and column interpretation should be transposed. Microsoft's Excel has been known to produce (non-standard conforming) DIF files which would need
transpose = TRUE
to be read correctly.- fileEncoding
character string: if non-empty declares the encoding used on a file (not a connection or clipboard) so the character data can be re-encoded. See the ‘Encoding’ section of the help for
file
, the ‘R Data Import/Export Manual’ and ‘Note’.
Value
A data frame (data.frame
) containing a representation of
the data in the file. Empty input is an error unless col.names
is specified, when a 0-row data frame is returned: similarly giving
just a header line if header = TRUE
results in a 0-row data frame.
Note
The columns referred to in as.is
and colClasses
include
the column of row names (if any).
Less memory will be used if colClasses
is specified as one of
the six atomic vector classes.
References
The DIF format specification can be found by searching on http://www.wotsit.org/; the optional header fields are ignored. See also https://en.wikipedia.org/wiki/Data_Interchange_Format.
The term is likely to lead to confusion: Windows will have a ‘Windows Data Interchange Format (DIF) data format’ as part of its WinFX system, which may or may not be compatible.
See Also
The R Data Import/Export manual.
scan
, type.convert
,
read.fwf
for reading fixed width
formatted input;
read.table
;
data.frame
.
Examples
library(utils)
# NOT RUN {
## read.DIF() may need transpose = TRUE for a file exported from Excel
udir <- system.file("misc", package = "utils")
dd <- read.DIF(file.path(udir, "exDIF.dif"), header = TRUE, transpose = TRUE)
dc <- read.csv(file.path(udir, "exDIF.csv"), header = TRUE)
stopifnot(identical(dd, dc), dim(dd) == c(4,2))
# }