The function cuts continuous variables in large-scale assessments' data in to variables with discrete values. The resulting variables can be numeric or categorical (i.e. factors) depending on if value labels for the new values are provided.
Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.
The src.variables specifies the variables that shall be cut. Only continuous variables are accepted. Multiple src.variables can be passed. These will be split at the same cut points (see below). PVs are not accepted.
The new.variables argument is optional and specifies the names of the new discrete variables from the src.variables. The sequence of the new.variables names is the same as the src.variables. If the new.variables argument is omitted, the function will create the names automatically, appending CUT at the end of the src.variables and store the discrete variable data under these names. If provided, the number of new.variables must be the same as the number of src.variables.
The new.var.labels is optional. Regardless whether new.variables are provided, if new.var.labels are provided, they will be assigned to the new.variables generated from the discretization. If neither new.variables not new.var.labels are provided, the function will automatically generate new.variables (see above) and copy the variable labels from src.variables to the newly generated variables, appending Cut at the beginning. The argument takes a vector with the same number of elements as the number of variable names in src.variables.
cut.points is a mandatory argument. It specifies the ranges (from-to) in the original variables to be cut into discrete categories. There can be multiple cut.points, the new values will be the ranges between them. For example, if the 3.29309, 7.97028, 9.98618, and 10.99411 cut points are passed, there will be five categories in the resulting discrete variables, as follow:
1 - from lowest up to 3.29309;
2 - from above 3.29309 up to 7.97028;
3 - from above 7.97028 up to 9.98618;
4 - from above 9.98618 up to 10.99411; and
5 - from above 10.99411 to the highest value.
The cut.points must be within the range of the src.variables. Otherwise the function will stop with an error.
The value.labels is optional. If omitted, the values in the new discrete variables will be numeric (integers). If the data was exported with missing.to.NA = FALSE (i.e. user-defined missings are kept) the missing values will remain as they are. If the value.labels are provided, the new values will be converted to factor levels. If the data was exported with missing.to.NA = FALSE the names of missing values will be assigned to factor levels too. Either way, the missing values will remain as missing values and handled properly by the analysis functions. If missing.to.NA = TRUE (i.e. setting the user-defined missing values to NA), the NA values will remain as NA in the resulting discrete new.variables.
If full path to .RData file is provided to out.file, the data.set will be written to that file. If no, the complemented data will remain in the memory.