lsa.recode.vars: Recode variables in large-scale assessments' data sets

Description

Utility function to recode variables in objects or data sets containing objects of class lsa.data, taking care of user-defined missing values, if specified.

Usage

lsa.recode.vars(
  data.file,
  data.object,
  src.variables,
  new.variables,
  old.new,
  new.labels,
  missings.attr,
  variable.labels,
  out.file
)

Value

A lsa.data object in memory (if out.file is missing) or .RData file containing lsa.data object with the recoded values for the specified variables. In addition, the function will print tables for the specified variables before and after recoding them to check if all recodings were done as intended. In addition, it will print warnings if different issues have been encountered.

Arguments

data.file: Full path to the .RData file containing lsa.data object. Either this or data.object shall be specified, but not both. See details.
data.object: The object in the memory containing lsa.data object. Either this or data.file shall be specified, but not both. See details.
src.variables: Names of the source variables with the same class whose values shall be recoded. See details.
new.variables: Optional, vector of variable names to be created with the recoded values with the same length as src.variables. If missing, the src.variables will be overwritten.
old.new: String with the recoding instructions matching the length of the factor levels (or unique values in case of numeric or character variables) in the variables. See details and examples.
new.labels: The new labels if the src.variables variables are of class factor or labels to be assigned to the recoded values (i.e. turning variables of class numeric or character into factors) with the same length as the new desired values. See details.
missings.attr: Optional, list of character vectors to assign user-defined missing values for each recoded variable. See details and examples.
variable.labels: Optional, string vector with the new variable labels to be assigned. See details.
out.file: Full path to the .RData file to be written. If missing, the object will be written to memory. See examples.

Details

Before recoding variables of interest, it is worth running the lsa.vars.dict to check their properties.

Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.

The variable names passed to src.variables must be with the same class and structure, i.e. same number of levels and same labels in case of factor variables, or the same unique values in case of numeric or character variables. If the classes differ, the function will stop with an error. If the unique values and/or labels differ, the function would execute the recodings, but will drop a warning.

The new.variables is optional. If provided, the recoded values will be saved under the provided new variable names and the src.variables will remain unchanged. If missing, the variables passed in src.variables will be overwritten. Note that the number of names passed to src.variables and new.variables must be the same.

The old.new (old values to new values) is the recoding scheme to be evaluated and executed provided as a characters string in the form of "1=1;2=1;3=2;4=3". In this example it means "recode 1 into 1, 2 into one, 3 into 2, and 4 into 3". Note that all available values have to be included in the recoding statement, even if they are not to be changed. In this example, if we omit recoding 1 into 1, 1 will be set to NA during the recoding. This recoding definition works with factor and numeric variables. For character variables the individual values have to be defined in full, e.g. "'No time'='30 minutes or less';'30 minutes or less'='30 minutes or less';'More than 30 minutes'='More than 30 minutes';'Omitted or invalid'='Omitted or invalid'" because these cannot be reliably referred to by position (as for factors) or actual number (as for numeric).

The new.labels assigns new labels to factor variables. Their length must be the same as for the newly recoded values. If the variables passed to src.variabes are character or numeric, and new.labels are provided, the recoded variables will be converted to factors. If, on the other hand, the src.variables are factors and no new.labels are provided, the variables will be converted to numeric.

Note that the lsa.convert.data has two options: keep the user-defined missing values (missing.to.NA = FALSE) and set the user-defined missing values to NA (missing.to.NA = TRUE). The former option will provide an attribute with user-defined missing values attached to each variable they have been defined for, the latter will not (i.e. will assign all user-defined missing values to NA). In case variables from data converted with the former option are recoded, user-defined missing values have to be supplied to missings.attr, otherwise (if all available values are recoded) the user-defined missing values will appear as valid codes. Not recoding the user-defined missing codes available in the data will automatically set them to NA. In either case, the function will drop a warning. On the other hand, if the data was exported with missing.to.NA = TRUE, there will be no attributes with user-defined missing codes and omitting missings.attr will issue no warning. User-defined missing codes can, however, be added in this case too, if necessary. The missings.attr has to be provided as a list where each component is a vector with the values for the missing codes. See the examples.

The variable.labels argument provides the variable labels to be assigned to the recoded variables. If omitted and new.variables are provided the newly created variables will have no variable labels. If provided, and new.variables are not provided, they will be ignored. If full path to .RData file is provided to out.file, the data.set will be written to that file. If no, the data will remain in the memory.

Description

Usage

Value

Arguments

Details

See Also