When the argument to
specifies a certain statistical package ("R"
,
"Stata"
, "SPSS"
or "SAS"
), the name of the destination file will be the
same as the name of the input file from the argument to
, with an automatically
added software specific extension.
Alternatively, the argument to
can be specified as a path to a specific file, in
which case the software package is determined from its file extension. The following extentions
are currently recognized: .xml
for DDI, .rds
for R, .dta
for Stata,
.sav
for SPSS and .sas7bdat
for SAS.
The argument binpath
is used only for Stata (if installed on the local machine),
to coerce regular missing values to their specific missing values using letters from a
to
z
, given that package haven does not convert Stata missing values by default.
Specifying the path to the binary executable file is also a Boolean signal to attempt converting
the missing values via an automatic script that recodes all unique missing values to the same
letters, the lowest numerical value being assigned to the letter a
.
Additional parameters can be specified via the three dots argument ...
, that are
passed to the respective functions from package haven. For instance the function
write_dta()
has an additional argument called version
when writing a Stata file.
Note that this function creates a target file in the same directory as the source file, which is
different from importing the source file into R. To import a file, users should refer to the
specific functions from package haven, such as read_sav()
or read_dta()
etc., and be aware the result object is a
tibble
.
The current version reads and creates DDI Codebook version 2.5, with future versions to extend
the functionality for DDI Lifecycle versions 3.x and link to the future package DDI4R
for the UML model based version 4. It extends the standard DDI Codebook by offering the possibility
to embed a CSV version of the raw data into the XML file containing the Codebook, into a
notes
child of the fileDscr
component. This type of Codebook is unique to this
package and automatically detected when converting to another statistical software.
Future versions will attempt to extend converting the missing values to SAS types, but otherwise
users can also use a setup file produced by function setupfile()
and run the
commands manually.
When importing a file, the R object of choice is a tibble because is the only type of object in R
that allows specifying multiple (coded) missing values. It also plays nicely with the SPSS types of
variables, which are the most commonly used in the social sciences.