Learn R Programming

DDIwR (version 0.5)

convert: Convert a dataset from one statistical software to another

Description

This function converts (or transfers) between R, Stata, SPSS, SAS, and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and makes the first steps towards converting the user defined missing values.

Usage

convert(from, to, embed = FALSE, binpath = "", ...)

Arguments

from

A path to a file, or a tibble object

to

Character, the name of a software package or a path to a specific file

embed

Boolean, embed the data when generating a DDI XML file

binpath

Path to the binary executable file, to run a recoding script

...

Additional parameters passed to exporting functions, see the Details section

Details

When the argument to specifies a certain statistical package ("R", "Stata", "SPSS" or "SAS"), the name of the destination file will be the same as the name of the input file from the argument to, with an automatically added software specific extension.

Alternatively, the argument to can be specified as a path to a specific file, in which case the software package is determined from its file extension. The following extentions are currently recognized: .xml for DDI, .rds for R, .dta for Stata, .sav for SPSS and .sas7bdat for SAS.

The argument binpath is used only for Stata (if installed on the local machine), to coerce regular missing values to their specific missing values using letters from a to z, given that package haven does not convert Stata missing values by default. Specifying the path to the binary executable file is also a Boolean signal to attempt converting the missing values via an automatic script that recodes all unique missing values to the same letters, the lowest numerical value being assigned to the letter a.

Additional parameters can be specified via the three dots argument ..., that are passed to the respective functions from package haven. For instance the function write_dta() has an additional argument called version when writing a Stata file.

Note that this function creates a target file in the same directory as the source file, which is different from importing the source file into R. To import a file, users should refer to the specific functions from package haven, such as read_sav() or read_dta() etc., and be aware the result object is a tibble.

The current version reads and creates DDI Codebook version 2.5, with future versions to extend the functionality for DDI Lifecycle versions 3.x and link to the future package DDI4R for the UML model based version 4. It extends the standard DDI Codebook by offering the possibility to embed a CSV version of the raw data into the XML file containing the Codebook, into a notes child of the fileDscr component. This type of Codebook is unique to this package and automatically detected when converting to another statistical software.

Future versions will attempt to extend converting the missing values to SAS types, but otherwise users can also use a setup file produced by function setupfile() and run the commands manually.

When importing a file, the R object of choice is a tibble because is the only type of object in R that allows specifying multiple (coded) missing values. It also plays nicely with the SPSS types of variables, which are the most commonly used in the social sciences.

References

DDI - Data Documentation Initiative, see https://www.ddialliance.org/

See Also

setupfile, getMetadata, tibble, labelled, labelled_spss

Examples

Run this code
# NOT RUN {
# Assuming an SPSS file called test.sav is located in the working directory
# the following command will extract the metadata in a DDI Codebook and
# produce a test.xml file in the same directory
convert("test.sav", to = "DDI")

# It is possible to include the data in the XML file, using:
convert("test.sav", to = "DDI", embed = TRUE)

# To produce a Stata file:
convert("test.sav", to = "Stata")

# Since Stata has different types of missing values than SPSS, it is
# possible to transform these missing values via an automatically run script
# using the argument "binpath", assuming that Stata is installed

# The paths to the binaries differ in various operating systems. A possible
# path for Windows, for Stata version 12 could be:
binpath <- "C:/Progra~1/Stata12/Stata.exe"

# For MacOS, the path could be:
binpath <- "/Applications/Stata/Stata.app/Contents/MacOS/Stata"

# The final command, which also converts to Stata types of missing values
convert("test.sav", to = "Stata", binpath = binpath)
# }

Run the code above in your browser using DataLab