Uses the read.xport
and lookup.xport
functions in the
foreign
library to import SAS datasets. SAS date, time, and
date/time variables are converted respectively to Date
,
POSIX, or POSIXct
objects in R,
variable names are converted to lower case, SAS labels are associated
with variables, and (by default) integer-valued variables are converted
from storage mode double
to integer
. If the user ran
PROC FORMAT CNTLOUT=
in SAS and included the resulting dataset in
the SAS version 5 transport file, variables having customized formats
that do not include any ranges (i.e., variables having standard
PROC FORMAT; VALUE
label formats) will have their format labels looked
up, and these variables are converted to S factor
s.
For those users having access to SAS, method='csv'
is preferred
when importing several SAS datasets.
Run SAS macro exportlib.sas
available from
https://github.com/harrelfe/Hmisc/blob/master/src/sas/exportlib.sas
to convert all SAS datasets in a SAS data library (from any engine
supported by your system) into CSV
files. If any customized
formats are used, it is assumed that the PROC FORMAT CNTLOUT=
dataset is in the data library as a regular SAS dataset, as above.
SASdsLabels
reads a file containing PROC CONTENTS
printed output to parse dataset labels, assuming that PROC
CONTENTS
was run on an entire library.
sasxport.get(file, lowernames=TRUE, force.single = TRUE,
method=c('read.xport','dataload','csv'), formats=NULL, allow=NULL,
out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL)
sasdsLabels(file)
If there is more than one dataset in the transport file other than the
PROC FORMAT
file, the result is a list of data frames
containing all the non-PROC FORMAT
datasets. Otherwise the
result is the single data frame. There is an exception if out
is specified; that causes separate R
save
files to be written
and the returned value to be a list corresponding to the SAS datasets,
with key PROC CONTENTS
information in a data frame making up
each part of the list.
sasdsLabels
returns a named
vector of dataset labels, with names equal to the dataset names.
name of a file containing the SAS transport file.
file
may be a URL beginning with https://
. For
sasdsLabels
, file
is the name of a file containing a
PROC CONTENTS
output listing. For method='csv'
,
file
is the name of the directory containing all the CSV
files created by running the exportlib
SAS macro.
set to FALSE
to keep from converting SAS
variable names to lower case
set to FALSE
to keep integer-valued
variables not exceeding integer
storage mode
set to "dataload"
if you have the dataload
executable installed and want to use it instead of
read.xport
. This seems to correct some errors in which
rarely some factor variables are always missing when read by
read.xport
when in fact they have some non-missing values.
a data frame or list (like that created by
read.xport
) containing PROC FORMAT
output, if such output is not stored in the main transport file.
a vector of characters allowed by R that should not be converted to periods in variable names. By default, underscores in variable names are converted to periods as with R before version 1.9.
a character string specifying a directory in which to write
separate R save
files (.rda
files) for each regular
dataset. Each file and the data frame inside it is named with the
SAS dataset name translated to lower case and with underscores
changed to periods. The default NULL
value of out
results in a data frame or a list of data frames being returned.
When out
is given, sasxport.get
returns only metadata (see
below), invisibly.
out
only works with methods='csv'
. out
should
not have a trailing slash.
a vector of names of SAS datasets to process (original SAS
upper case names). Must include PROC FORMAT
dataset if it
exists, and if the kept datasets use any of its value label formats.
a vector of names of SAS datasets to ignore (original SAS upper case names)
SAS character variables are converted to S factor
objects if as.is=FALSE
or if as.is
is a number between
0 and 1 inclusive and the number of unique values of the variable is
less than the number of observations (n
) times as.is
.
The default if as.is
is .5, so character variables are
converted to factors only if they have fewer than n/2
unique
values. The primary purpose of this is to keep unique
identification variables as character values in the data frame
instead of using more space to store both the integer factor codes
and the factor labels.
an optional function that will be run on each data frame
created, when method='csv'
and out
are specified. The
result of all the FUN
calls is made into a list corresponding
to the SAS datasets that are read. This list is the FUN
attribute of the result returned by sasxport.get
.
Frank E Harrell Jr
See contents.list
for a way to print the
directory of SAS datasets when more than one was imported.
read.xport
,label
,sas.get
,
Dates
,DateTimeClasses
,
lookup.xport
,contents
,describe
if (FALSE) {
# SAS code to generate test dataset:
# libname y SASV5XPT "test2.xpt";
#
# PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN;
# PROC FORMAT CNTLOUT=format;RUN; * Name, e.g. 'format', unimportant;
# data test;
# LENGTH race 3 age 4;
# age=30; label age="Age at Beginning of Study";
# race=2;
# d1='3mar2002'd ;
# dt1='3mar2002 9:31:02'dt;
# t1='11:13:45't;
# output;
#
# age=31;
# race=4;
# d1='3jun2002'd ;
# dt1='3jun2002 9:42:07'dt;
# t1='11:14:13't;
# output;
# format d1 mmddyy10. dt1 datetime. t1 time. race race.;
# run;
# data z; LENGTH x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
# DO i=1 TO 100;
# x3=ranuni(3);
# x4=ranuni(5);
# x5=ranuni(7);
# x6=ranuni(9);
# x7=ranuni(11);
# x8=ranuni(13);
# output;
# END;
# DROP i;
# RUN;
# PROC MEANS; RUN;
# PROC COPY IN=work OUT=y;SELECT test format z;RUN; *Creates test2.xpt;
w <- sasxport.get('test2.xpt')
# To use an existing copy of test2.xpt available on the web:
w <- sasxport.get('https://github.com/harrelfe/Hmisc/raw/master/inst/tests/test2.xpt')
describe(w$test) # see labels, format names for dataset test
# Note: if only one dataset (other than format) had been exported,
# just do describe(w) as sasxport.get would not create a list for that
lapply(w, describe)# see descriptive stats for both datasets
contents(w$test) # another way to see variable attributes
lapply(w, contents)# show contents of both datasets
options(digits=7) # compare the following matrix with PROC MEANS output
t(sapply(w$z, function(x)
c(Mean=mean(x),SD=sqrt(var(x)),Min=min(x),Max=max(x))))
}
Run the code above in your browser using DataLab