importLimeSurveyData: importLimeSurveyData

Description

This function can be used to import files exported by LimeSurvey, a powerful Open Source online survey application that can be used for, for example, psychological experiments and other research.

Usage

importLimeSurveyData(datafile = NULL,
  dataPath = NULL,
  datafileRegEx = NULL,
  scriptfile = NULL,
  limeSurveyRegEx.varNames =
    "names\\(data\\)\\[\\d*\\] <- ",
  limeSurveyRegEx.toChar =
    "data\\[, \\d*\\] <- as.character\\(data\\[, \\d*\\]\\)",
  limeSurveyRegEx.varLabels =
    "attributes\\(data\\)\\$variable.labels\\[\\d*\\] <- \".*\"",
  limeSurveyRegEx.toFactor =
    paste0("data\\[, \\d*\\] <- factor\\(data\\[, \\d*\\], ",
           "levels=c\\(.*\\),.*labels=c\\(.*\\)\\)"),
  limeSurveyRegEx.varNameSanitizing =
    list(list(pattern = "#", replacement = "."),
         list(pattern = "\\$", replacement = ".")),
  setVarNames = TRUE,
  setLabels = TRUE,
  convertToCharacter = FALSE,
  convertToFactor = FALSE,
  categoricalQuestions = NULL,
  massConvertToNumeric = TRUE,
  dataHasVarNames = TRUE,
  encoding = "NULL",
  dataEncoding = "unknown",
  scriptEncoding = "ASCII")

Arguments

datafile

The path and filename of the file containing the data (comma separated values).

dataPath, datafileRegEx

Path containing datafiles: this can be used to read multiple datafiles, if the data is split between those. This is useful when downloading the entire datafile isn't possible because of server restrictions, for example when the processing time for the script in LimeSurvey that generates the datafiles is limited. In that case, the data can be downloaded in portions, and specifying a path here enables reading all datafiles in one go. Use the regular expression to indicate which files in the path should be read.

scriptfile

The path and filename of the file containing the R script to import the data.

limeSurveyRegEx.varNames

The regular expression used to extract the variable names from the script file. The first regex expression (i.e. the first expression between parentheses) will be extracted as variable name.

limeSurveyRegEx.toChar

The regular expression to detect the lines in the import script where variables are converted to the character type.

limeSurveyRegEx.varLabels

The regular expression used to detect the lines in the import script where variable labels are set.

limeSurveyRegEx.toFactor

The regular expression used to detect the lines in the import script where vectors are converted to factors.

limeSurveyRegEx.varNameSanitizing

A list of regular expression patterns and their replacements to sanitize the variable names (e.g. replace hashes/pound signs ('#') by something that is not considered the comment symbol by R).

setVarNames, setLabels, convertToCharacter, convertToFactor

Whether to set variable names or labels, or convert to character or factor, using the code isolated using the specified regular expression.

categoricalQuestions

Which variables (specified using LimeSurvey variable names) are considered categorical questions; for these, the script to convert the variables to factors, as extracted from the LimeSurvey import file, is applied.

massConvertToNumeric

Whether to convert all variables to numeric using massConvertToNumeric.

dataHasVarNames

Whether the variable names are included as header (first line) in the comma separated values file (data file).

encoding, dataEncoding, scriptEncoding

The encoding of the files; encoding overrides dataEncoding and scriptEncoding, and so can be used to specify the same encoding for both.

Value

The dataframe.

Details

This function was intended to make importing data from LimeSurvey a bit easier. The default settings used by LimeSurvey are not always convenient, and this function provides a bit more control.

Examples

Run this code

# NOT RUN {
### Of course, you need valid LimeSurvey files. This is an example of
### what you'd do if you have them, assuming you specified that path
### containing the data in 'dataPath', the name of the datafile in
### 'dataFileName', the name of the script file in 'dataLoadScriptName',
### and that you only want variables 'informedConsent', 'gender', 'hasJob',
### 'currentEducation', 'prevEducation', and 'country' to be converted to
### factors.
dat <- importLimeSurveyData(datafile = file.path(dataPath, dataFileName),
                            scriptfile = file.path(dataPath, dataLoadScriptName),
                            categoricalQuestions = c('informedConsent',
                                                     'gender',
                                                     'hasJob',
                                                     'currentEducation',
                                                     'prevEducation',
                                                     'country'));
# }