lsa.aggregate.vars: Aggregate variables in LSA datasets

Description

lsa.aggregate.vars aggregates continuous variables by group and appends them to the dataset.

Usage

lsa.aggregate.vars(
  data.file,
  data.object,
  group.vars,
  src.variables,
  new.variables,
  new.var.labels,
  aggr.fun,
  out.file
)

Value

A lsa.data object in memory (if out.file is missing) or .RData file containing lsa.data object with the new aggregated variables.

Arguments

data.file: The file containing lsa.data object. Either this or data.object shall be specified, but not both. See details.
data.object: The object in the memory containing lsa.data object. Either this or data.file shall be specified, but not both. See details.
group.vars: Variable(s) to aggregate the src.variables by. If no grouping variables are provided, the src.variables will be aggregated on country level. See details.
src.variables: Names of the variables to aggregate. Accepts only continuous variables. No PV variables are accepted. See details.
new.variables: The names of the new, aggregated variables to append to the dataset. See details.
new.var.labels: Optional, vector of strings to add as variable labels for the new.variables. See details.
aggr.fun: Function to apply when aggregating the variable. Accepts mean (default), median, or mode. See details.
out.file: Full path to the .RData file to be written. If missing, the original object will be overwritten in the memory. See examples.

Details

The function aggregates continuous variables in large-scale assessments' data. The aggregation can be done by groups defined by the group.vars. Multiple grouping variables can be specified. All aggregations are done within each country separately.

Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.

The src.variables specifies the variables that shall be aggregated. Only continuous variables are accepted. PVs are not accepted.

The new.variables argument is optional and specifies the names of the new variables aggregated from the src.variables. The sequence of the new.variables names is the same as the src.variables. If the new.variables argument is omitted, the function will create the names automatically, appending AGGR at the end of the src.variables and store the aggregated variable data under these names. If provided, the number of new.variables must be the same as the number of src.variables.

The new.var.labels is optional. Regardless whether new.variables are provided, if new.var.labels are provided, they will be assigned to the new.variables generated from the aggregation. If neither new.variables not new.var.labels are provided, the function will automatically generate new.variables (see above) and copy the variable labels from src.variables to the newly generated variables, appending Aggregated at the beginning. The argument takes a vector with the same number of elements as the number of variable names in src.variables.

The aggr.fun specifies the function to be applied when performing the aggregation. The acceptable values are mean (default), median and mode. Using these methods, the aggregation will be performed by groups defined by the group.vars within each country.

If full path to .RData file is provided to out.file, the data.set will be written to that file. If no, the complemeted data will remain in the memory.

Examples

Run this code


# Aggregate the PIRLS 2021 Students Like Reading and the Home Resources for Learning scales per
# school and save the dataset into a file, overwiriting it. The names for the new variables
# are automatically generated.
if (FALSE) {
lsa.aggregate.vars(data.file = "C:/Data/PIRLS_2021_Student_Miss_to_NA.RData",
src.variables = c("ASBGSLR", "ASBGHRL"), group.vars = "IDSCHOOL",
out.file = "/tmp/test.RData")
}

# Same as the above, but assign custom variable names and their labels, and write the data to
# the memory instead of saving it on the disk.
if (FALSE) {
lsa.aggregate.vars(data.file = "C:/Data/PIRLS_2021_Student_Miss_to_NA.RData",
src.variables = c("ASBGSLR", "ASBGHRL"), new.variables = c("LIKEREAD", "LRNRES"),
new.var.labels = c("Aggregated like reading", "Aggregated learning resources"),
group.vars = "IDSCHOOL",
out.file = "/tmp/test.RData")
}