lsa.convert.data
converts datasets from large-scale assessments from their original formats (SPSS or ASCII text) into .RData
files. print
prints the properties of an lsa.data
objects on screen. lsa.select.countries.PISA
lets selecting PISA data from specific countries for analysis.
lsa.convert.data(
inp.folder,
PISApre15 = FALSE,
ISO,
missing.to.NA = FALSE,
out.folder
)# S3 method for lsa.data
print(x, col.nums, ...)
lsa.select.countries.PISA(data.file, data.object, cnt.names, output.file)
lsa.convert.data
.RData
data files, containing an object with class lsa.data
, an extension of the data.table
class. The data.table
object has the same name as the .RData
file it is saved in. The object has additional attributes: study name (study
), study cycle (cycle
), and respondent file type (file.type
). Each variable has its own additional attributes: its own label attached to it, if it existed in the source SPSS file. If the missing.to.NA
was set to TRUE
, each variable has an attribute missings
, containing the user-defined missing values from the SPSS files.
The object in the .RData
file is keyed on the country ID variable.
print
Prints the information of an lsa.data
object (study, cycle, respondent type, number of countries, key -- country ID, and if the variables have user-defined missing values) and a preview of the data. The default preview (when no col.nums
) are specified will include the first six columns.
lsa.select.countries.PISA
Writes a file containing an lsa.object
with the data for the countries passed to the cnt.names
argument, if the output.file
argument is used. If the output.file
argument is not used, the lsa.object
will be written to the memory with the same name as the file name in inp.file
.
The folder containing the IEA-like SPSS data files or ASCII text files and
.sps
import files for OECD PISA data from cycles prior to 2015.
See details.
If blank, the working directory (getwd()
) is used.
When converting PISA files, set to TRUE
if the input files are from
PISA cycles prior 2015 (ASCII text format with .sps
control files)
or to FALSE
(default) if they are in SPSS .sav
format, as in
the case of IEA studies and the like and OECD PISA 2015 or later. Ignored
if the input folder contains IEA-like studies.
Vector containing character ISO codes of the countries' data files to
convert (e.g. ISO = c("aus", "svn")
). If none of the files contain
the specified ISO codes in their names, the codes are ignored and a warning
is shown. Ignored when converting PISA files (both for cycles prior 2015
and 2015 and later). This argument is case-insensitive, i.e. the ISO codes
can be passed as lower- or upper-case.
(lower or upper) as the original SPSS .sav
files.
Should the user-defined missing values be recoded to NA
? If
TRUE
, all user-defined missing values from the SPSS files (or
specified in the OECD PISA import syntax files) are all imported as
NA
. If FALSE
(default), they are converted to valid values
and the missing codes are assigned to an attribute missings
for
each variable. See details.
Path to the folder where the converted files will be stored. If omitted,
same as the inp.folder
, and if the inp.folder
is missing as
well, this will be getwd()
.
(print
only) lsa.data
object.
(print
only) Which columns to print, positions by number.
(print
only) Further arguments passed to or from other methods.
(lsa.select.countries.PISA
only) Converted PISA data file to select
countries' data from. Either this one or data.object
must be
provided, but not both. See details.
(lsa.select.countries.PISA
only) PISA object in memory to filter.
Either this one or data.file
must be provided, but not both.
See details.
(lsa.select.countries.PISA
only) Character vector containing the
names of the countries, as they exist in the data, which should stay in the
PISA exported file or object in memory.
(lsa.select.countries.PISA
only) Full path to the file with the filtered
countries' data to be written on disk. If not provided, the PISA object
will be written to memory.
The lsa.convert.data
function converts the originally provided data files into .RData
sets. RALSA adds its own method for printing lsa.data
objects on screen. The lsa.select.countries.PISA
is a utility function that allows the user to select countries of interest from a converted PISA data file (or PISA object residing in memory) and remove the rest of the countries' data. This is useful when the user does not want to analyze all countries data in a PISA file.
lsa.convert.data
IEA studies, as well as OECD TALIS and some conducted by other organizations, provide their data in SPSS .sav
format with same or very similar structure: one file per country and type of respondent (e.g. school principal, student, teacher, etc.) per population. For IEA studies and OECD TALIS use the ISO
argument to specify the countries' three-letter ISO codes whose data is to be converted. The three-letter ISO codes for each country can be found in the user guide for the study in scope. For example, the ISO codes of the countries participating in PIRLS 2016 can be found in its user guide on pages 52-54. To convert the files from all countries in the downloaded data from IEA studies and OECD TALIS, simply omit the ISO
argument. Cycles of OECD PISA prior to 2015, on the other hand, do not provide SPSS .sav
or other binary files, but ASCII text files, accompanied with SPSS syntax (.sps
) files that are used to import the text files into SPSS. These files are per each type of respondent containing all countries' data. The lsa.convert.data
function converts the data from either source assuring that the structure of the output .RData
files is the same, although the structure of the input files is different (SPSS binary files vs. ASCII text files plus import .sps
files). The data from PISA 2015 and later, on the other hand, is provided in SPSS format (all countries in one file per type of respondent). Thus, the PISApre15
argument needs to be specified as TRUE
when converting data sets from PISA prior to its 2015 cycle. The default for the PISApre15
argument is FALSE
which means that the function expects to find IEA-like SPSS binary files per country and type of respondent in the directory in inp.folder
or OECD PISA 2015 (or later) SPSS .sav
files. If PISApre15 = TRUE
and country codes are provided to ISO
, they will be ignored because PISA files contain data from all countries together.
The files to be converted must be in a folder on their own, from a single study, single cycle and single population. In addition, if there are more than one file types per study, cycle and population, these also must be in different folders. For example, in TIMSS 2019 the grade 8 data files are main (end with "m7", electronic version of the paper administered items), bridge (end with "b7", paper administration with trend items for countries participating in previous TIMSS cycles) and Problem Solving and Inquiry (PSI) tasks (end with "z7", electronic administration only, optional for countries). These different types must be in separate folders. In case of OECD PISA prior 2015, the folder must contain both the ASCII text files and the SPSS .sps
import syntax files. If the folder contains data sets from more than one study or cycle, the operation will break with error messages.
If the path for the inp.folder
argument is not specified, the function will search for files in the working directory (i.e. as returned by getwd()
). If folder path for the the out.folder
is not specified, it will take the one from the inp.folder
and the files will be stored there. If both the inp.folder
and out.folder
arguments are missing, the directory from getwd()
will be used to search, convert and store files.
If missing.to.NA
is set to TRUE
, all user-defined missing values from the SPSS will be imported as NA
which is R
's only kind of missing value. This will be the most often case when analyzing these data since the reason why the response is missing will be irrelevant most of the times. However, if it is needed to know why the reasons for missing responses, as when analyzing achievement items (i.e. not administered vs. omitted or not reached), the argument shall be set to FALSE
(default for this argument) which will convert all user-defined missing values as valid ones.
print
RALSA uses its own method for printing objects of class lsa.data
on screen. Passing just the object name to the console will print summarized information about the study's data and the first six columns of the dataset (see the Value section). If col.nums
specifies which columns from the dataset shall be included in the output (see examples).
lsa.select.countries.PISA
lsa.select.countries.PISA
lets the user to take a PISA dataset, either a converted file or lsa.data
object in the memory and reduce the number of countries in it by passing the names of the countries which need to be kept as a character vector to the cnt.names
argument. If full path (including the file name) to the resulting file is specified in the output.file
argument, it will be written on disk. If not, the data will be written to an lsa.object
in memory with the same name as the input file. See the examples.
Foy, P. (Ed.). (2018). PIRLS 2016 User Guide for the International Database. TIMSS & PIRLS International Study Center.
lsa.merge.data
, lsa.vars.dict
, lsa.recode.vars
# Convert all IEA-like SPSS files in the working directory, setting all user-defined missing
# values to \code{NA}
if (FALSE) {
lsa.convert.data(missing.to.NA = TRUE)
}
# Convert IEA TIMSS 2011 grade 8 data from Australia and Slovenia, keeping all user-defined
# missing values as valid ones specifying custom input and output directories
if (FALSE) {
lsa.convert.data(inp.folder = "C:/TIMSS_2011_G8", ISO = c("aus", "svn"), missing.to.NA = FALSE,
out.folder = "C:/Data")
}
# Convert OECD PISA 2009 files converting all user-defined missing values to \code{NA}
# using custom input and output directories
if (FALSE) {
lsa.convert.data(inp.folder = "/media/PISA_2009", PISApre15 = TRUE, missing.to.NA = TRUE,
out.folder = "/tmp")
}
# Print 20th to 25th column in PISA 2018 student questionnaire dataset loaded into memory
if (FALSE) {
print(x = cy07_msu_stu_qqq, col.nums = 20:25)
}
# Select data from Albania and Slovenia from PISA 2018 student questionnaire dataset
# and save it under the same file name in a different folder
if (FALSE) {
lsa.select.countries.PISA(data.file = "C:/PISA/cy07_msu_stu_qqq.RData",
cnt.names = c("Albania", "Slovenia"),
output.file = "C:/PISA/Reduced/cy07_msu_stu_qqq.RData")
}
Run the code above in your browser using DataLab