Read an SPSS Data File
read.spss reads a file stored by the SPSS
This was orignally written in 2000 and has limited support for changes in SPSS formats since (which have not been many).
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE, max.value.labels = Inf, trim.factor.names = FALSE, trim_values = TRUE, reencode = NA, use.missings = to.data.frame)
- character string: the name of the file or URL to read.
- logical: convert variables with value labels
into R factors with those levels? This is only done if there are
at least as many labels as values of the variable (when values
without a matching label are returned as
- logical: return a data frame?
- logical: only variables with value labels and
at most this many unique values will be converted to factors if
- logical: trim trailing spaces from factor levels?
- logical: should values and value labels have
trailing spaces ignored when matching for
use.value.labels = TRUE?
- logical: should character strings be re-encoded to the
current locale. The default,
NA, means to do so in a UTF-8 locale, only. Alternatively a character string specifying an encoding to assume for the file.
- logical: should information on user-defined
missing values be used to set the corresponding values to
This uses modified code from the PSPP project (http://www.gnu.org/software/pspp/ for reading the SPSS formats.
If the filename appears to be a URL (of schemes http:,
ftp: or https:) the URL is first downloaded to a
temporary file and then read. (https: is supported where
download.file with its current default
Occasionally in SPSS, value labels will be added to some values of a
continuous variable (e.g. to distinguish different types of missing
data), and you will not want these variables converted to factors. By
max.value.labels you can specify that variables with a
large number of distinct values are not converted to factors even if
they have value labels. In addition, variables will not be converted
to factors if there are non-missing values that have no value label.
The value labels are then returned in the
attribute of the variable.
If SPSS variable labels are present, they are returned as the
"variable.labels" attribute of the answer.
Fixed length strings (including value labels) are padded on the right
with spaces by SPSS, and so are read that way by R. The default
trim_values=TRUE causes trailing spaces to be ignored
when matching to value labels, as examples have been seen where the
strings and the value labels had different amounts of padding. See
the examples for
sub for ways to remove trailing spaces
in character data.
provides a list of translations from Windows codepage numbers to
encoding names that
iconv is likely to know about and so
suitable values for
reencode. Automatic re-encoding is
attempted for apparent codepages of 200 or more in a UTF-8 locale:
some other high-numbered codepages can be re-encoded on most systems,
but the encoding names are platform-dependent (see
A list (or optionally a data frame) with one component for each
variable in the saved data set.If what looks like a Windows codepage was recorded in the SPSS file,
it is attached (as a number) as attribute
"codepage"to the result.There may be attributes
"label.table"is a named list of value labels with one element per variable, either
NULLor a named character vector. Attribute
"variable.labels"is a named character vector with names the short variable names and elements the long names.If there are user-defined missing values, there will be a attribute
"Missings". This is a named list with one list element per variable. Each element has an element
type, a length-one character vector giving the type of missingness, and may also have an element
valuewith the values corresponding to missingness. This is a complex subject (where the R and C source code for
read.spssis the main documentation), but the simplest cases are types
"three"with a corresponding number of (real or string) values whose labels can be found from the
"label.table"attribute. Other possibilities are a finite or semi-infinite range, possibly plus a single value. See also http://www.gnu.org/software/pspp/manual/html_node/Missing-Observations.html#Missing-Observations.
If SPSS value labels are converted to factors the underlying numerical codes will not in general be the same as the SPSS numerical values, since the numerical codes in R are always $1,2,3,\dots$.
You may see warnings about the file encoding for SPSS
files: it is possible such files contain non-ASCII character data
which need re-encoding. The most common occurrence is Windows codepage
1252, a superset of Latin-1. The encoding is recorded (as an integer)
"codepage" of the result if it looks like a
Windows codepage. Automatic re-encoding is done only in UTF-8
locales: see argument
A different interface also based on the PSPP codebase is available in
package memisc: see its help for
## Not run: ## if you have an SPSS file called 'datafile': # read.spss("datafile") # ## don't convert value labels to factor levels # read.spss("datafile", use.value.labels = FALSE) # ## convert value labels to factors for variables with at most # ## ten distinct values. # read.spss("datafile", max.value.labels = 10) # ## End(Not run)
Dear Angel, First of all thank you for any help. I'm trying to use read_spss or read.spss and may be any of them are deprecated and I don't know what I'm doing wrong because even the files that before I could run doen't run anymore. I use R in a Mac (any idea if it has any difference) in the following scritp: library(lavaan) library(psych) library(sem) library(foreign) '''OIT = read.spss ("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", to.data.frame = TRUE) ''' It appears (above), but that is not the problem: re-encoding from latin1 Warning messages: 1: In read.spss("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", : Undeclared level(s) 0 added in variable: edad2_t2 2: In read.spss("~/Documents/1 Projetos/Peiró/Banco de dados/Bases Bancos de outros países/2019_01_25_esp_br_ec_it_longitudinal t1_t2.sav", : Undeclared level(s) 4 added in variable: ANTIGU2_t2 ''''indx <- sapply(OIT, is.factor) OIT[indx] <- lapply(OIT[indx], function(x) as.numeric(x))'''' The error that appear is: Error in file(file, "r") : não é possível abrir a conexão [#translating: it is not possible to open conection] Além disso: Warning message: In file(file, "r") : não foi possível abrir o arquivo ' F1=~HR1+HR2+HR3 [#translating: it is not possible to open file] F2=~HR4+HR5+HR6 F3=~HR7+HR8+HR9 F4=~HR10+HR11+HR12 F5=~HR13+HR14+HR15 F6=~HR16+HR17+HR18 F7=~HR19+HR20+HR21 F8=~HR22+HR23+HR24 BAND=~F1+F2+F3+F4+F5 BANE=~F6+F7+F8': No such file or directory Thank you very much for any help. I don't know why it doesn't work.