
Last chance! 50% off unlimited learning
Sale ends in
sas.codes
and these may be added back to the
levels
of a factor
variable using the code.levels
function.
Information about special missing values may be captured in an attribute
of each variable having special missing values. This attribute is
called special.miss
, and such variables are given class special.miss
.
There are print
, []
, format
, and is.special.miss
methods for such variables.
The chron
function is used to set up date, time, and date-time variables.
If using S-Plus 5 or 6 or later, the timeDate
function is used
instead.
Under R, POSIXct is used for dates and date-times. For times without
dates, these still need to be stored in date-time format in POSIX.
Such SAS time variables are given a major class of timePOSIXt
and a
format.timePOSIXt
function so that the date portion (which will
always be 1/1/1970) will not print by default.
If a date variable represents a partial date (.5 added if
month missing, .25 added if day missing, .75 if both), an attribute
partial.date
is added to the variable, and the variable also becomes
a class imputed
variable.
The describe
function uses information about partial dates and
special missing values.
There is an option to automatically uncompress (or gunzip) compressed
SAS datasets.sas.get(library, member, variables, ifs,
format.library=library, id, dates.,
keep.log=TRUE, log.file="_temp_.log", macro=sas.get.macro,
data.frame.out=TRUE, clean.up=TRUE, quiet=FALSE,
temp=tempfile("SaS"), formats=TRUE, recode=formats,
special.miss=FALSE, sasprog="sas",
as.is=.5, check.unique.id=TRUE, force.single=FALSE,
where, uncompress=FALSE)is.special.miss(x, code)
x[...]
## S3 method for class 'special.miss':
print(x, ...)
## S3 method for class 'special.miss':
format(x, ...)
sas.codes(object)
code.levels(object)
sas.get
with
special.miss=T
or with recode
in effect.formats
to F
to keep sas.get
from telling the SAS macro to
retrieve value label formats from format.library
. When you do not
specify formats
or recode
, sas.get
TRUE
if formats
is TRUE
. If it is
TRUE
, variables that have an appropriate format (see above) are
recoded as factor
objects, which map the values
to the value labspecial.miss
to
TRUE
. This will cause the special.miss
attribute and the
special.miss
crow.names
attribute of a data frame, but
the id variable is still retained as a variable in the data frame.
(if data.frame.out
is data.frame.out=T
, SAS character variables are converted to S factor
objects if as.is=F
or if as.is
is a number between 0 and 1 inclusive and
the number of unique values of the variable is less than
the number of oid
is specified, the row names are checked for
uniqueness if check.unique.id=T
. If any are duplicated, a warning
is printed. Note that if a data frame is being created with duplicate
row names, statements such as my.da
LENGTH
s > 4 are stored as
S double precision numerics, which allow for the same precision as
a SAS LENGTH
8 variable. Set force.single=T
to store every
numeric variable in si"sas"
, "yearfrac"
, "yearfrac2"
, "yymmdd"
.
If a SAS variable has a date format (one of "DATE", "MMDDYY", "YYMMDD",
"DDMMYY", "YYQ", "MONYY", "JULIAN"), it will be converted toFALSE
, delete the SAS log file upon completion.TRUE
, the return value will be an S data frame,
otherwise it will be a list.TRUE
, remove all temporary files when finished. You
may want to keep these while debugging the SAS macro.FALSE
, print the contents of the SAS log file if
there has been an error.T
to automatically invoke the UNIX gunzip
command
(if member.ssd01.gz
exists) or the uncompress
command
(if member.ssd01.Z
exists) to uncompress the SAS dataset before
proceeding. Twhere
, each individual variable is placed into a
separate object (whose name is the name of the variable) using the
assign
function witcode
is omitted, is.special.miss
will return
a T
for each observation that has any special missing value.sas.get
data.frame.out
is TRUE
, the output will
be a data frame resembling the SAS dataset. If id
was specified, that column of the data frame will be used
as the row names of the data frame. Each variable in the data frame
or vector in the list will have the attributes label
and format
containing SAS labels and formats. Underscores in formats are
converted to periods. Formats for character variables have $
placed
in front of their names.
If formats
is TRUE
and there are any
appropriate format definitions in format.library
, the returned
object will have attribute formats
containing lists named the
same as the format names (with periods substituted for underscores and
character formats prefixed by $
).
Each of these lists has a vector called values
and one called
labels
with the PROC FORMAT; VALUE ...
definitions.
If data.frame.out
is FALSE
, the output will
be a list of vectors, each containing a variable from the SAS
dataset. If id
was specified, that element of the list will
be used as the id
attribute of the entire list.
quiet
is FALSE
, then the SAS log file will be
printed under the control of the less pager.special.miss=T
and there are no special missing
values in the data SAS dataset, the SAS step will bomb.For variables having a PROC FORMAT VALUE
format with some of the levels undefined, sas.get
will interpret those
values as NA
if you are using recode
.
The SAS macro sas_get
uses record lengths of up to 4096 in two
places. If you are exporting records that are very long (because of
a large number of variables and/or long character variables), you
may want to edit these LRECL
s to quadruple them, for example.
SAS Institute Inc. (1988). SAS Technical Report P-176, Using the SAS System, Release 6.03, under UNIX Operating Systems and Derivatives. SAS Institute Inc., Cary, North Carolina.
SAS Institute Inc. (1985). SAS Introductory Guide. Third Edition. SAS Institute Inc., Cary, North Carolina.
data.frame
, describe
,
label
,
upData
,
cleanup.import
sas.contents("saslib", "mice")
# [1] "dose" "ld50" "strain" "lab_no"
attr(, "n"):
# [1] 117
mice <- sas.get("saslib", mem="mice", var=c("dose", "strain", "ld50"))
plot(mice$dose, mice$ld50)
nude.mice <- sas.get(lib=unix("echo $HOME/saslib"), mem="mice",
ifs="if strain='nude'")
nude.mice.dl <- sas.get(lib=unix("echo $HOME/saslib"), mem="mice",
var=c("dose", "ld50"), ifs="if strain='nude'")
# Get a dataset from current directory, recode PROC FORMAT; VALUE \dots
# variables into factors with labels of the form "good(1)" "better(2)",
# get special missing values, recode missing codes .D and .R into new
# factor levels "Don't know" and "Refused to answer" for variable q1
d <- sas.get(".", "mydata", recode=2, special.miss=TRUE)
attach(d)
nl <- length(levels(q1))
lev <- c(levels(q1), "Don't know", "Refused")
q1.new <- as.integer(q1)
q1.new[is.special.miss(q1,"D")] <- nl+1
q1.new[is.special.miss(q1,"R")] <- nl+2
q1.new <- factor(q1.new, 1:(nl+2), lev)
# Note: would like to use factor() in place of as.integer \dots but
# factor in this case adds "NA" as a category level
d <- sas.get(".", "mydata")
sas.codes(d$x) # for PROC FORMATted variables returns original data codes
d$x <- code.levels(d$x) # or attach(d); x <- code.levels(x)
# This makes levels such as "good" "better" "best" into e.g.
# "1:good" "2:better" "3:best", if the original SAS values were 1,2,3
# Retrieve the same variables from another dataset (or an update of
# the original dataset)
mydata2 <- sas.get('mydata2', var=names(d))
# This only works if none of the original SAS variable names contained _
mydata2 <- cleanup.import(mydata2) # will make true integer variables
# Code from Don MacQueen to generate SAS dataset to test import of
# date, time, date-time variables
# data ssd.test;
# d1='3mar2002'd ;
# dt1='3mar2002 9:31:02'dt;
# t1='11:13:45't;
# output;
#
# d1='3jun2002'd ;
# dt1='3jun2002 9:42:07'dt;
# t1='11:14:13't;
# output;
# format d1 mmddyy10. dt1 datetime. t1 time.;
# run;
Run the code above in your browser using DataLab