Utility function to recode variables in objects or data sets containing objects of class lsa.data
, taking care of user-defined missing values, if specified.
lsa.recode.vars(
data.file,
data.object,
src.variables,
new.variables,
old.new,
new.labels,
missings.attr,
variable.labels,
out.file
)
A lsa.data
object in memory (if out.file
is missing) or .RData
file containing lsa.data
object with the recoded values for the specified variables.
In addition, the function will print tables for the specified variables before and after recoding them to check if all recodings were done as intended. In addition, it will print warnings if different issues have been encountered.
Full path to the .RData
file containing lsa.data
object.
Either this or data.object
shall be specified, but not both.
See details.
The object in the memory containing lsa.data
object. Either this or
data.file
shall be specified, but not both. See details.
Names of the source variables with the same class whose values shall be recoded. See details.
Optional, vector of variable names to be created with the recoded values
with the same length as src.variables
. If missing, the src.variables
will be overwritten.
String with the recoding instructions matching the length of the factor levels (or unique values in case of numeric or character variables) in the variables. See details and examples.
The new labels if the src.variables
variables are of class factor
or labels to be assigned to the recoded values (i.e. turning variables of class
numeric
or character
into factors) with the same length as the
new desired values. See details.
Optional, list of character vectors to assign user-defined missing values for each recoded variable. See details and examples.
Optional, string vector with the new variable labels to be assigned. See details.
Full path to the .RData
file to be written. If missing, the object
will be written to memory. See examples.
Before recoding variables of interest, it is worth running the lsa.vars.dict
to check their properties.
Either data.file
or data.object
shall be provided as source of data. If both of them are provided, the function will stop with an error message.
The variable names passed to src.variables
must be with the same class and structure, i.e. same number of levels and same labels in case of factor
variables, or the same unique values in case of numeric
or character
variables. If the classes differ, the function will stop with an error. If the unique values and/or labels differ, the function would execute the recodings, but will drop a warning.
The new.variables
is optional. If provided, the recoded values will be saved under the provided new variable names and the src.variables
will remain unchanged. If missing, the variables passed in src.variables
will be overwritten. Note that the number of names passed to src.variables
and new.variables
must be the same.
The old.new
(old values to new values) is the recoding scheme to be evaluated and executed provided as a characters string in the form of "1=1;2=1;3=2;4=3"
. In this example it means "recode 1 into 1, 2 into one, 3 into 2, and 4 into 3". Note that all available values have to be included in the recoding statement, even if they are not to be changed. In this example, if we omit recoding 1 into 1, 1 will be set to NA during the recoding. This recoding definition works with factor and numeric variables. For character variables the individual values have to be defined in full, e.g. "'No time'='30 minutes or less';'30 minutes or less'='30 minutes or less';'More than 30 minutes'='More than 30 minutes';'Omitted or invalid'='Omitted or invalid'"
because these cannot be reliably referred to by position (as for factors) or actual number (as for numeric).
The new.labels
assigns new labels to factor variables. Their length must be the same as for the newly recoded values. If the variables passed to src.variabes
are character or numeric, and new.labels
are provided, the recoded variables will be converted to factors. If, on the other hand, the src.variables
are factors and no new.labels
are provided, the variables will be converted to numeric.
Note that the lsa.convert.data
has two options: keep the user-defined missing values (missing.to.NA = FALSE
) and set the user-defined missing values to NA (missing.to.NA = TRUE
). The former option will provide an attribute with user-defined missing values attached to each variable they have been defined for, the latter will not (i.e. will assign all user-defined missing values to NA). In case variables from data converted with the former option are recoded, user-defined missing values have to be supplied to missings.attr
, otherwise (if all available values are recoded) the user-defined missing values will appear as valid codes. Not recoding the user-defined missing codes available in the data will automatically set them to NA
. In either case, the function will drop a warning. On the other hand, if the data was exported with missing.to.NA = TRUE
, there will be no attributes with user-defined missing codes and omitting missings.attr
will issue no warning. User-defined missing codes can, however, be added in this case too, if necessary. The missings.attr
has to be provided as a list where each component is a vector with the values for the missing codes. See the examples.
The variable.labels
argument provides the variable labels to be assigned to the recoded variables. If omitted and new.variables
are provided the newly created variables will have no variable labels. If provided, and new.variables
are not provided, they will be ignored.
If full path to .RData
file is provided to out.file
, the data.set will be written to that file. If no, the data will remain in the memory.
lsa.convert.data
, lsa.vars.dict