This function converts (or transfers) between R, Stata, SPSS, SAS, Excel and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and facilitates a consistent conversion of the missing values.
convert(from, to = NULL, declared = TRUE, chartonum = FALSE, recode = TRUE,
encoding = "UTF-8", csv = NULL, ...)
A path to a file, or a data.frame object
Character, the name of a software package or a path to a specific file
Logical, return the resulting dataset as a declared object
Logical, recode character categorical variables to numerical categorical variables
Logical, recode missing values
The character encoding used to read a file
Path to the CSV file, if not embedded in XML file containing the DDI Codebook.
Additional parameters passed to exporting functions, see the Details section
Adrian Dusa
When the argument to
specifies a certain statistical package
("R"
, "Stata"
, "SPSS"
, "SAS"
, "XPT"
) or
"Excel"
, the name of the destination file will be idential to the one in
the argument from
, with an automatically added software specific
extension.
SPSS portable file (with the extension ".por"
) can only be read, and
SAS Transport files (with the extension ".xpt"
) can be both read and
written.
Alternatively, the argument to
can be specified as a path to a
specific file, in which case the software package is determined from its file
extension. The following extentions are currently recognized: .xml
for
DDI, .rds
for R, .dta
for Stata, .sav
for SPSS,
.sas7bdat
for SAS, and .xlsx
for Excel.
Additional parameters can be specified via the three dots argument
...
, that are passed to the respective functions from packages
haven and readxl. For instance the function
write_dta()
has an additional argument called
version
when writing a Stata file.
The most important argument to consider is called user_na
, part of
the function read_sav()
. Although it is defaulted
to FALSE
in package haven, in package DDIwR it
is used as having the value of TRUE
. Users who really want to deactivate
it should explicitly specify use_na = FALSE
in function
convert
().
The same three dots argument is used to pass additional parameters to other
functions in this package, for instance exportDDI()
when converting
to a DDI file. Its argument embed
(activated by default) can be
used to control embedding the data in the XML file. Deactivating it will create
a CSV file in the same directory, using the same file name as the XML file.
When converting from DDI, if the dataset is not embedded in the XML file, the
CSV file is expected to be found in the same directory as the DDI Codebook, and
it should have the same file name as the XML file. Alternatively, the path to
the CSV file can be provided via the csv
argument. Additional
formal parameters of the function read.csv()
can be
passed via the same three dots ...
argument.
The argument chartonum
signals recoding character categorical
variables, and employs the function recodeCharcat()
. This
only makes sense when recoding to Stata, which does not allow allocating labels
for anything but integer variables.
If the argument to
is left to NULL
, the data is (invisibly)
returned to the R enviroment. Conversion to R, either in the working space or as
a data file, will result (by default) in a data frame containing declared
labelled variables, as defined in package declared.
The current version reads and creates DDI Codebook version 2.5, with future
versions to extend the functionality for DDI Lifecycle versions 3.x and link to
the future package DDI4R for the UML model based version 4. It extends
the standard DDI Codebook by offering the possibility to embed a CSV version of
the raw data into the XML file containing the Codebook, into a notes
child of the fileDscr
component. This type of codebook is unique to this
package and automatically detected when converting to another statistical
software.
Converting the missing values to SAS is not tested, but it relies on the same
package haven using the ReadStat C library. Should it not work, it
is also possible to use a setup file produced by function
setupfile()
and run the commands manually.
The argument recode
controls how missing values are treated. If
the input file has SPSS like numeric codes, they will be recoded to extended
(a-z) missing types when converting to Stata or SAS. If the input has Stata like
extended codes, they will be recoded to SPSS like numeric codes.
The character encoding
is usually passed to the corresponding
functions from package haven. It can be set to NULL
to reset
at the default in that package.
DDI - Data Documentation Initiative, see https://ddialliance.org/
setupfile
,
getMetadata
,
declared
,
labelled
if (FALSE) {
# Assuming an SPSS file called test.sav is located in the working directory
# the following command will extract the metadata in a DDI Codebook and
# produce a test.xml file in the same directory
convert("test.sav", to = "DDI")
# It is possible to include the data in the XML file, using:
convert("test.sav", to = "DDI", embed = TRUE)
# To produce a Stata file:
convert("test.sav", to = "Stata")
# To produce an R file:
convert("test.sav", to = "R")
# To produce an Excel file:
convert("test.sav", to = "Excel")
}
Run the code above in your browser using DataLab