lsa.convert.data: Convert Large-Scale Assessments' Datasets to .RData Format

Description

lsa.convert.data converts datasets from large-scale assessments from their original formats (SPSS or ASCII text) into .RData files. print prints the properties of an lsa.data objects on screen. lsa.select.countries.PISA lets selecting PISA data from specific countries for analysis.

Usage

lsa.convert.data(
  inp.folder,
  PISApre15 = FALSE,
  ISO,
  missing.to.NA = FALSE,
  out.folder
)
# S3 method for lsa.data
print(x, col.nums, ...)
lsa.select.countries.PISA(data.file, data.object, cnt.names, output.file)

Value

lsa.convert.data

.RData data files, containing an object with class lsa.data, an extension of the data.table class. The data.table object has the same name as the .RData file it is saved in. The object has additional attributes: study name (study), study cycle (cycle), and respondent file type (file.type). Each variable has its own additional attributes: its own label attached to it, if it existed in the source SPSS file. If the missing.to.NA was set to TRUE, each variable has an attribute missings, containing the user-defined missing values from the SPSS files.

The object in the .RData file is keyed on the country ID variable.
print

Prints the information of an lsa.data object (study, cycle, respondent type, number of countries, key -- country ID, and if the variables have user-defined missing values) and a preview of the data. The default preview (when no col.nums) are specified will include the first six columns.
lsa.select.countries.PISA

Writes a file containing an lsa.object with the data for the countries passed to the cnt.names argument, if the output.file argument is used. If the output.file argument is not used, the lsa.object will be written to the memory with the same name as the file name in inp.file.

Arguments

inp.folder: The folder containing the IEA-like SPSS data files or ASCII text files and .sps import files for OECD PISA data from cycles prior to 2015. See details. If blank, the working directory (getwd()) is used.
PISApre15: When converting PISA files, set to TRUE if the input files are from PISA cycles prior 2015 (ASCII text format with .sps control files) or to FALSE (default) if they are in SPSS .sav format, as in the case of IEA studies and the like and OECD PISA 2015 or later. Ignored if the input folder contains IEA-like studies.
ISO: Vector containing character ISO codes of the countries' data files to convert (e.g. ISO = c("aus", "svn")). If none of the files contain the specified ISO codes in their names, the codes are ignored and a warning is shown. Ignored when converting PISA files (both for cycles prior 2015 and 2015 and later). This argument is case-insensitive, i.e. the ISO codes can be passed as lower- or upper-case. (lower or upper) as the original SPSS .sav files.
missing.to.NA: Should the user-defined missing values be recoded to NA? If TRUE, all user-defined missing values from the SPSS files (or specified in the OECD PISA import syntax files) are all imported as NA. If FALSE (default), they are converted to valid values and the missing codes are assigned to an attribute missings for each variable. See details.
out.folder: Path to the folder where the converted files will be stored. If omitted, same as the inp.folder, and if the inp.folder is missing as well, this will be getwd().
x: (print only) lsa.data object.
col.nums: (print only) Which columns to print, positions by number.
...: (print only) Further arguments passed to or from other methods.
data.file: (lsa.select.countries.PISA only) Converted PISA data file to select countries' data from. Either this one or data.object must be provided, but not both. See details.
data.object: (lsa.select.countries.PISA only) PISA object in memory to filter. Either this one or data.file must be provided, but not both. See details.
cnt.names: (lsa.select.countries.PISA only) Character vector containing the names of the countries, as they exist in the data, which should stay in the PISA exported file or object in memory.
output.file: (lsa.select.countries.PISA only) Full path to the file with the filtered countries' data to be written on disk. If not provided, the PISA object will be written to memory.

Details

The lsa.convert.data function converts the originally provided data files into .RData sets. RALSA adds its own method for printing lsa.data objects on screen. The lsa.select.countries.PISA is a utility function that allows the user to select countries of interest from a converted PISA data file (or PISA object residing in memory) and remove the rest of the countries' data. This is useful when the user does not want to analyze all countries data in a PISA file.

lsa.convert.data

IEA studies, as well as OECD TALIS and some conducted by other organizations, provide their data in SPSS .sav format with same or very similar structure: one file per country and type of respondent (e.g. school principal, student, teacher, etc.) per population. For IEA studies and OECD TALIS use the ISO argument to specify the countries' three-letter ISO codes whose data is to be converted. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. To convert the files from all countries in the downloaded data from IEA studies and OECD TALIS, simply omit the ISO argument. Cycles of OECD PISA prior to 2015, on the other hand, do not provide SPSS .sav or other binary files, but ASCII text files, accompanied with SPSS syntax (.sps) files that are used to import the text files into SPSS. These files are per each type of respondent containing all countries' data. The lsa.convert.data function converts the data from either source assuring that the structure of the output .RData files is the same, although the structure of the input files is different (SPSS binary files vs. ASCII text files plus import .sps files). The data from PISA 2015 and later, on the other hand, is provided in SPSS format (all countries in one file per type of respondent). Thus, the PISApre15 argument needs to be specified as TRUE when converting data sets from PISA prior to its 2015 cycle. The default for the PISApre15 argument is FALSE which means that the function expects to find IEA-like SPSS binary files per country and type of respondent in the directory in inp.folder or OECD PISA 2015 (or later) SPSS .sav files. If PISApre15 = TRUE and country codes are provided to ISO, they will be ignored because PISA files contain data from all countries together.

The files to be converted must be in a folder on their own, from a single study, single cycle and single population. In addition, if there are more than one file types per study, cycle and population, these also must be in different folders. For example, in TIMSS 2019 the grade 8 data files are main (end with "m7", electronic version of the paper administered items), bridge (end with "b7", paper administration with trend items for countries participating in previous TIMSS cycles) and Problem Solving and Inquiry (PSI) tasks (end with "z7", electronic administration only, optional for countries). These different types must be in separate folders. In case of OECD PISA prior 2015, the folder must contain both the ASCII text files and the SPSS .sps import syntax files. If the folder contains data sets from more than one study or cycle, the operation will break with error messages.

If the path for the inp.folder argument is not specified, the function will search for files in the working directory (i.e. as returned by getwd()). If folder path for the the out.folder is not specified, it will take the one from the inp.folder and the files will be stored there. If both the inp.folder and out.folder arguments are missing, the directory from getwd() will be used to search, convert and store files.

If missing.to.NA is set to TRUE, all user-defined missing values from the SPSS will be imported as NA which is R's only kind of missing value. This will be the most often case when analyzing these data since the reason why the response is missing will be irrelevant most of the times. However, if it is needed to know why the reasons for missing responses, as when analyzing achievement items (i.e. not administered vs. omitted or not reached), the argument shall be set to FALSE (default for this argument) which will convert all user-defined missing values as valid ones.
print

RALSA uses its own method for printing objects of class lsa.data on screen. Passing just the object name to the console will print summarized information about the study's data and the first six columns of the dataset (see the Value section). If col.nums specifies which columns from the dataset shall be included in the output (see examples).
lsa.select.countries.PISA

lsa.select.countries.PISA lets the user to take a PISA dataset, either a converted file or lsa.data object in the memory and reduce the number of countries in it by passing the names of the countries which need to be kept as a character vector to the cnt.names argument. If full path (including the file name) to the resulting file is specified in the output.file argument, it will be written on disk. If not, the data will be written to an lsa.object in memory with the same name as the input file. See the examples.

References

Foy, P. (Ed.). (2018). PIRLS 2016 User Guide for the International Database. TIMSS & PIRLS International Study Center.

Examples

Run this code

# Convert all IEA-like SPSS files in the working directory, setting all user-defined missing
# values to \code{NA}
if (FALSE) {
lsa.convert.data(missing.to.NA = TRUE)
}


# Convert IEA TIMSS 2011 grade 8 data from Australia and Slovenia, keeping all user-defined
# missing values as valid ones specifying custom input and output directories
if (FALSE) {
lsa.convert.data(inp.folder = "C:/TIMSS_2011_G8", ISO = c("aus", "svn"), missing.to.NA = FALSE,
out.folder = "C:/Data")
}

# Convert OECD PISA 2009 files converting all user-defined missing values to \code{NA}
# using custom input and output directories
if (FALSE) {
lsa.convert.data(inp.folder = "/media/PISA_2009", PISApre15 = TRUE, missing.to.NA = TRUE,
out.folder = "/tmp")
}

# Print 20th to 25th column in PISA 2018 student questionnaire dataset loaded into memory
if (FALSE) {
print(x = cy07_msu_stu_qqq, col.nums = 20:25)
}

# Select data from Albania and Slovenia from PISA 2018 student questionnaire dataset
# and save it under the same file name in a different folder
if (FALSE) {
lsa.select.countries.PISA(data.file = "C:/PISA/cy07_msu_stu_qqq.RData",
cnt.names = c("Albania", "Slovenia"),
output.file = "C:/PISA/Reduced/cy07_msu_stu_qqq.RData")
}

Run the code above in your browser using DataLab