Last chance! 50% off unlimited learning
Sale ends in
Data are actually imported by `translating' an
importer file into a data.set
using
as.data.set
or subset
.
The importer
mechanism is more flexible and extensible
than read.spss
and read.dta
of package "foreign", as most of the
parsing of the file headers is done in R.
They are also
adapted to load efficiently large data sets.
Most importantly, importer objects support the
labels
, missing.values
,
and description
s, provided by this package.
spss.fixed.file(file,
columns.file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE
)spss.portable.file(file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE)
spss.system.file(file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE)
Stata.file(file)
## The most important methods for "importer" objects are:
## S3 method for class 'importer':
subset(x, subset, select, drop = FALSE, \dots)
## S3 method for class 'importer':
as.data.set(x,row.names=NULL,optional=NULL,
compress.storage.modes=FALSE,\dots)
"importer"
.DATA LIST FIXED
statementVARIABLE LABELS
statementVALUE LABELS
statementMISSING VALUES
statementsubset
returns only a single item
object and not a data.set
.spss.fixed.file
, spss.portable.file
,
spss.system.file
, and Stata.file
return, respectively, objects of class
"spss.fixed.importer"
, "spss.portable.importer"
,
"spss.system.importer"
, or "Stata.importer"
,
which, by inheritance, are also objects of class "importer"
.Objects of class "importer"
have at least the following two slots:
"item.vector"
which
provides a `prototype' for the "data.set"
set objects returned
by the as.data.set
and subset
methods for objects of
class "importer"
as.data.frame
for importer
objects does
the actual data import and returns a data frame. Note that in contrast
to read.spss
, the variable names of the
resulting data frame will be lower case. If long variable names
are defined (in case of a PSPP/SPSS system file), they take
precedence and are not coerced to lower case.spss.fixed.file
, spss.portable.file
, spss.sysntax.file
,
or Stata.file
,
causes R to read in the header of the data file and/or
the syntax files that contain information about
the variables, such as the columns that they occupy
(in case of spss.fixed.file
), variable labels,
value labels and missing values. The information in the file header and/or the accompagnying
files is then processed to prepare the file for importing.
Thus the inner structure of an importer
object may
well vary according to what type of file is to imported and
what additional information is given.
The as.data.set
and subset
methods
for "importer"
objects internally use the
generic functions seekData
, readData
,
and readSubset
, which have methods for the
subclasses of "importer"
.
These functions are not callable
from outside the package, however.
codebook
, description
,
read.spss
# Extract American National Election Study of 1948
nes1948.por <- UnZip("anes/NES1948.ZIP","NES1948.POR",
package="memisc")
# Get information about the variables contained.
nes1948 <- spss.portable.file(nes1948.por)
# The data are not yet loaded:
show(nes1948)
# ... but one can see what variables are present:
description(nes1948)
# Now a subset of the data is loaded:
vote.socdem.48 <- subset(nes1948,
select=c(
v480018,
v480029,
v480030,
v480045,
v480046,
v480047,
v480048,
v480049,
v480050
))
# Let's make the names more descriptive:
vote.socdem.48 <- rename(vote.socdem.48,
v480018 = "vote",
v480029 = "occupation.hh",
v480030 = "unionized.hh",
v480045 = "gender",
v480046 = "race",
v480047 = "age",
v480048 = "education",
v480049 = "total.income",
v480050 = "religious.pref"
)
# It is also possible to do both
# in one step:
# vote.socdem.48 <- subset(nes1948,
# select=c(
# vote = v480018,
# occupation.hh = v480029,
# unionized.hh = v480030,
# gender = v480045,
# race = v480046,
# age = v480047,
# education = v480048,
# total.income = v480049,
# religious.pref = v480050
# ))
# We examine the data more closely:
codebook(vote.socdem.48)
# ... and conduct some analyses.
#
t(genTable(percent(vote)~occupation.hh,data=vote.socdem.48))
# We consider only the two main candidates.
vote.socdem.48 <- within(vote.socdem.48,{
truman.dewey <- vote
valid.values(truman.dewey) <- 1:2
truman.dewey <- relabel(truman.dewey,
"VOTED - FOR TRUMAN" = "Truman",
"VOTED - FOR DEWEY" = "Dewey")
})
summary(truman.relig.glm <- glm((truman.dewey=="Truman")~religious.pref,
data=vote.socdem.48,
family="binomial",
))
Run the code above in your browser using DataLab