0th

Percentile

##### Read an SPSS Data File

read.spss reads a file stored by the SPSS save or export commands.

This was orignally written in 2000 and has limited support for changes in SPSS formats since (which have not been many).

Keywords
file
##### Usage
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE,
max.value.labels = Inf, trim.factor.names = FALSE,
trim_values = TRUE, reencode = NA, use.missings = to.data.frame,
sub = ".", add.undeclared.levels = c("sort", "append", "no"),
duplicated.value.labels = c("append", "condense"),
duplicated.value.labels.infix = "_duplicated_", ...)
##### Arguments
file

character string: the name of the file or URL to read.

use.value.labels

logical: convert variables with value labels into R factors with those levels? This is only done if there are at least as many labels as values of the variable (when values without a matching label are returned as NA).

to.data.frame

logical: return a data frame?

max.value.labels

logical: only variables with value labels and at most this many unique values will be converted to factors if TRUE.

trim.factor.names

logical: trim trailing spaces from factor levels?

trim_values

logical: should values and value labels have trailing spaces ignored when matching for use.value.labels = TRUE?

reencode

logical: should character strings be re-encoded to the current locale. The default, NA, means to do so in UTF-8 or latin-1 locales, only. Alternatively a character string specifying an encoding to assume for the file.

use.missings

logical: should information on user-defined missing values be used to set the corresponding values to NA?

sub

character string: If not NA it is used by iconv to replace any non-convertible bytes in character/factor input. Default is ".". For back compatibility with foreign versions <= 0.8-68 use sub=NA.

character: specify how to handle variables with at least one value label and further non-missing values that have no value label (like a factor levels in R). For "sort" (the default) it adds undeclared factor levels to the already declared levels (and labels) and sort them according to level, for "append" it appends undeclared factor levels to declared levels (and labels) without sorting, and for "no" this does not convert to factor in case of numeric SPSS levels (not labels), and still converts to factor if the SPSS levels are characters and to.data.frame=TRUE. For back compatibility with foreign versions <= 0.8-68 use add.undeclared.levels="no" (not recommended as this may convert some values with missing corresponding value labels to NA).

duplicated.value.labels

character: what to do with duplicated value labels for different levels. For "append" (the default), the first original value label is kept while further duplicated labels are renamed to paste0(label, duplicated.value.labels.infix, level), for "condense", all levels with identical labels are condensed into exactly the first of these levels in R. Back compatibility with foreign versions <= 0.8-68 is not given as R versions >= 3.4.0 no longer support duplicated factor labels.

duplicated.value.labels.infix

character: the infix used for labels of factor levels with duplicated value labels in SPSS (default "_duplicated_") if duplicated.value.labels="append".

...

passed to as.data.frame if to.data.frame = TRUE.

##### Details

This uses modified code from the PSPP project (http://www.gnu.org/software/pspp/ for reading the SPSS formats.

If the filename appears to be a URL (of schemes http:, ftp: or https:) the URL is first downloaded to a temporary file and then read. (https: is supported where supported by download.file with its current default method.)

Occasionally in SPSS, value labels will be added to some values of a continuous variable (e.g. to distinguish different types of missing data), and you will not want these variables converted to factors. By setting max.value.labels you can specify that variables with a large number of distinct values are not converted to factors even if they have value labels.

If SPSS variable labels are present, they are returned as the "variable.labels" attribute of the answer.

Fixed length strings (including value labels) are padded on the right with spaces by SPSS, and so are read that way by R. The default argument trim_values=TRUE causes trailing spaces to be ignored when matching to value labels, as examples have been seen where the strings and the value labels had different amounts of padding. See the examples for sub for ways to remove trailing spaces in character data.

URL http://msdn.microsoft.com/en-us/library/ms776446(VS.85).aspx provides a list of translations from Windows codepage numbers to encoding names that iconv is likely to know about and so suitable values for reencode. Automatic re-encoding is attempted for apparent codepages of 200 or more in a UTF-8 or latin-1 locale: some other high-numbered codepages can be re-encoded on most systems, but the encoding names are platform-dependent (see iconvlist).

##### Value

A list (or optionally a data frame) with one component for each variable in the saved data set.

If what looks like a Windows codepage was recorded in the SPSS file, it is attached (as a number) as attribute "codepage" to the result.

There may be attributes "label.table" and "variable.labels". Attribute "label.table" is a named list of value labels with one element per variable, either NULL or a named character vector. Attribute "variable.labels" is a named character vector with names the short variable names and elements the long names.

If there are user-defined missing values, there will be a attribute "Missings". This is a named list with one list element per variable. Each element has an element type, a length-one character vector giving the type of missingness, and may also have an element value with the values corresponding to missingness. This is a complex subject (where the R and C source code for read.spss is the main documentation), but the simplest cases are types "one", "two" and "three" with a corresponding number of (real or string) values whose labels can be found from the "label.table" attribute. Other possibilities are a finite or semi-infinite range, possibly plus a single value. See also http://www.gnu.org/software/pspp/manual/html_node/Missing-Observations.html#Missing-Observations.

##### Note

If SPSS value labels are converted to factors the underlying numerical codes will not in general be the same as the SPSS numerical values, since the numerical codes in R are always $1,2,3,\dots$.

You may see warnings about the file encoding for SPSS save files: it is possible such files contain non-ASCII character data which need re-encoding. The most common occurrence is Windows codepage 1252, a superset of Latin-1. The encoding is recorded (as an integer) in attribute "codepage" of the result if it looks like a Windows codepage. Automatic re-encoding is done only in UTF-8 and latin-1 locales: see argument reencode.

A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file.

##### Examples
# NOT RUN {
(sav <- system.file("files", "electric.sav", package = "foreign"))
str(dat)   # list structure with attributes

str(dat)   # now a data.frame

### Now we use an example file that is not very well structured and
### hence may need some special treatment with appropriate argument settings.
### Expect lots of warnings as value labels (corresponding to R factor labels) are uncomplete,
### and an unsupported long string variable is present in the data
(sav <- system.file("files", "testdata.sav", package = "foreign"))

x.sort <- read.spss(file=sav, to.data.frame = TRUE)
x.append <- read.spss(file=sav, to.data.frame = TRUE,
x.no <- read.spss(file=sav, to.data.frame = TRUE,

levels(x.sort$factor_n_undeclared) levels(x.append$factor_n_undeclared)
str(x.no$factor_n_undeclared) ### Examples for duplicated.value.labels: ## duplicated.value.labels = "append" (default) x.append <- read.spss(file=sav, to.data.frame=TRUE) ## duplicated.value.labels = "condense" x.condense <- read.spss(file=sav, to.data.frame=TRUE, duplicated.value.labels = "condense") levels(x.append$factor_n_duplicated)
levels(x.condense$factor_n_duplicated) as.numeric(x.append$factor_n_duplicated)
as.numeric(x.condense$factor_n_duplicated) ## Long Strings (>255 chars) are imported in consecutive separate variables ## (see warning about subtype 14): x <- read.spss(file=sav, to.data.frame=TRUE, stringsAsFactors=FALSE) cat.long.string <- function(x, w=70) cat(paste(strwrap(x, width=w), "\n")) ## first part: x$string_500:
cat.long.string(x$string_500) ## second part: x$STRIN0:
cat.long.string(x\$STRIN0)
## complete long string:
long.string <- apply(x[,c("string_500", "STRIN0")], 1, paste, collapse="")
cat.long.string(long.string)
# }

Documentation reproduced from package foreign, version 0.8-71, License: GPL (>= 2)

### Community examples

pereznebra@hotmail.com at Jan 28, 2019 foreign v0.8-71

Dear Angel, First of all thank you for any help. I'm trying to use read_spss or read.spss and may be any of them are deprecated and I don't know what I'm doing wrong because even the files that before I could run doen't run anymore. I use R in a Mac (any idea if it has any difference) in the following scritp: library(lavaan) library(psych) library(sem) library(foreign) '''OIT = read.spss ("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", to.data.frame = TRUE) ''' It appears (above), but that is not the problem: re-encoding from latin1 Warning messages: 1: In read.spss("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", : Undeclared level(s) 0 added in variable: edad2_t2 2: In read.spss("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", : Undeclared level(s) 4 added in variable: ANTIGU2_t2 ''''indx <- sapply(OIT, is.factor) OIT[indx] <- lapply(OIT[indx], function(x) as.numeric(x))'''' The error that appear is: Error in file(file, "r") : não é possível abrir a conexão [#translating: it is not possible to open conection] Além disso: Warning message: In file(file, "r") : não foi possível abrir o arquivo ' F1=~HR1+HR2+HR3 [#translating: it is not possible to open file] F2=~HR4+HR5+HR6 F3=~HR7+HR8+HR9 F4=~HR10+HR11+HR12 F5=~HR13+HR14+HR15 F6=~HR16+HR17+HR18 F7=~HR19+HR20+HR21 F8=~HR22+HR23+HR24 BAND=~F1+F2+F3+F4+F5 BANE=~F6+F7+F8': No such file or directory Thank you very much for any help. I don't know why it doesn't work.