read.spss
reads a file stored by the SPSS save
or
export
commands.This was orignally written in 2000 and has limited support for changes in SPSS formats since (which have not been many).
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE, max.value.labels = Inf, trim.factor.names = FALSE, trim_values = TRUE, reencode = NA, use.missings = to.data.frame)
NA
).TRUE
.use.value.labels = TRUE
?NA
, means to do so in a UTF-8
locale, only. Alternatively a character string specifying an encoding to
assume for the file.NA
?"codepage"
to the
result.There may be attributes "label.table"
and
"variable.labels"
. Attribute "label.table"
is a named
list of value labels with one element per variable, either NULL
or a named character vector. Attribute "variable.labels"
is a
named character vector with names the short variable names and
elements the long names.If there are user-defined missing values, there will be a attribute
"Missings"
. This is a named list with one list element per
variable. Each element has an element type
, a length-one
character vector giving the type of missingness, and may also have an
element value
with the values corresponding to missingness.
This is a complex subject (where the R and C source code for
read.spss
is the main documentation), but the simplest cases
are types "one"
, "two"
and "three"
with a
corresponding number of (real or string) values whose labels can be
found from the "label.table"
attribute. Other possibilities are
a finite or semi-infinite range, possibly plus a single value.
See also http://www.gnu.org/software/pspp/manual/html_node/Missing-Observations.html#Missing-Observations.
If the filename appears to be a URL (of schemes http:,
ftp: or https:) the URL is first downloaded to a
temporary file and then read. (https: is supported where
supported by download.file
with its current default
method
.)
Occasionally in SPSS, value labels will be added to some values of a
continuous variable (e.g. to distinguish different types of missing
data), and you will not want these variables converted to factors. By
setting max.value.labels
you can specify that variables with a
large number of distinct values are not converted to factors even if
they have value labels. In addition, variables will not be converted
to factors if there are non-missing values that have no value label.
The value labels are then returned in the "value.labels"
attribute of the variable.
If SPSS variable labels are present, they are returned as the
"variable.labels"
attribute of the answer.
Fixed length strings (including value labels) are padded on the right
with spaces by SPSS, and so are read that way by R. The default
argument trim_values=TRUE
causes trailing spaces to be ignored
when matching to value labels, as examples have been seen where the
strings and the value labels had different amounts of padding. See
the examples for sub
for ways to remove trailing spaces
in character data.
URL http://msdn.microsoft.com/en-us/library/ms776446(VS.85).aspx
provides a list of translations from Windows codepage numbers to
encoding names that iconv
is likely to know about and so
suitable values for reencode
. Automatic re-encoding is
attempted for apparent codepages of 200 or more in a UTF-8 locale:
some other high-numbered codepages can be re-encoded on most systems,
but the encoding names are platform-dependent (see
iconvlist
).
spss.system.file
.
## Not run: ## if you have an SPSS file called 'datafile':
# read.spss("datafile")
# ## don't convert value labels to factor levels
# read.spss("datafile", use.value.labels = FALSE)
# ## convert value labels to factors for variables with at most
# ## ten distinct values.
# read.spss("datafile", max.value.labels = 10)
# ## End(Not run)
Run the code above in your browser using DataLab