This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:
Invalsi census survey
School buildings
Number of students and school classes
Number of teachers
Broadband connection availability
To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.
Set_DB(
Year = 2023,
level = "LAU",
conservative = TRUE,
Invalsi = TRUE,
SchoolBuildings = TRUE,
nstud = TRUE,
nteachers = TRUE,
BroadBand = TRUE,
verbose = TRUE,
show_col_types = FALSE,
Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
Invalsi_grade = c(2, 5, 8, 10, 13),
Invalsi_WLE = FALSE,
SchoolBuildings_certifications = FALSE,
SchoolBuildings_include_numerics = TRUE,
SchoolBuildings_include_qualitatives = FALSE,
SchoolBuildings_row_cutout = FALSE,
SchoolBuildings_col_cut_thresh = 20000,
SchoolBuildings_flag_outliers = TRUE,
SchoolBuildings_count_missing = FALSE,
nstud_imputation_thresh = 19,
nstud_missing_to_1 = FALSE,
UB_nstud_byclass = 99,
LB_nstud_byclass = 1,
UB_nstud_byclass_grade = NULL,
LB_nstud_byclass_grade = NULL,
nstud_filter_by_grade = FALSE,
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
nstud_check = TRUE,
nstud_check_registry = "Any",
BroadBand_impute_missing = TRUE,
Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
NA_autoRM = NULL,
input_Invalsi_IS = NULL,
input_Registry = NULL,
input_SchoolBuildings = NULL,
input_nstud = NULL,
input_School2mun = NULL,
input_AdmUnNames = NULL,
input_InnerAreas = NULL,
input_teachers4student = NULL,
input_nteachers = NULL,
input_BroadBand = NULL,
autoAbort = FALSE
)
An object of class tbl_df
, tbl
and data.frame
Numeric or Character. The relevant school year. Available in the formats: 2023
, "2022/2023"
, 202223
, 20222023
.
Important: if input datasets are plugged in, please select the same Year
argument used to download the input data. 2023
by default.
Character. The administrative level of detail at which data must be aggregated.
Either "LAU"
/"Municipality"
or "NUTS-3"
/"Province"
. "LAU"
by default.
Logical. If FALSE
, only the schools included in all the datasets are taken as input. TRUE
by default.
Logical. Whether the Invalsi census data must be included (see Get_Invalsi_IS
. TRUE
by default.
Logical. Whether the school buildings dataset must be included (see link{Get_DB_MIUR}
, Util_DB_MIUR_num
. TRUE
by default.
Logical. Whether the students number per class must be included (see Get_nstud
. TRUE
by default.
Logical. Whether the number of teachers by province must be included (see link{Get_nteachers_prov}
). TRUE
by default.
Logical. Whether the broadband availability in schools must be included (see Get_BroadBand
). TRUE
by default
Logical. If TRUE
, the user keeps track of the main underlying operations. TRUE
by default.
Logical. If TRUE
, if the verbose
argument is also TRUE
, the columns of the raw dataset are shown during the download. FALSE
by default.
Character. If Invalsi == TRUE
, the school subject(s) to include, among "Englis_listening"
/"ELI"
, "English_reading"
/"ERE"
, "Italian"
/"Ita"
and "Mathematics"
/"MAT"
. All four by default.
Numeric. If Invalsi == TRUE
, the educational grade to choose. Either 2
(2nd year of primary school), 5
(last year of primary school), 8
(last year of middle shcool), 10
(2nd year of high school) or 13
(last year of school). All by default.
Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or 2
5
. FALSE
by default
Logical. If the school buldings database has to be downloaded, whether to include safety certifications. Only relevant from schol year 2020/21 onwards (see Get_DB_MIUR
). FALSE
by default
Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num
). TRUE
by default.
Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num
). FALSE
by default.
Logical. Whether to filter out rows including missing fields in the school buildings database (see Util_DB_MIUR_num
). FALSE
by default.
Numeric. The threshold of missing values allowed for each variable in the school buildings database (see Util_DB_MIUR_num
).
If a variable as a higher number of missing observations, then it is cut out. 20.000
by default.
Warning: if the option SchoolBuildings_row_cutout
is active, please select a lower threshold (e.g. 1000
)
Logical. Whether to assign NA to outliers in numeric variables; see Util_DB_MIUR_num
for more details. TRUE
by default.
Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also Group_DB_MIUR
). FALSE
by default.
Numeric. If nstud_missing_to_1 == TRUE
, the minimum threshold below which the number of classes is imputed to 1 if missing;
see also Util_nstud_wide
. 19
by default.
Numeric. If nstud == TRUE
, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh
, see Util_nstud_wide
). FALSE
by default.
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, high.
If focus is on class size, the upper limit of the acceptable school-level (if filter_by_grade == FALSE
) or grade-level (otherwise) average of the number of students by class.
If a whole school or any grade in a school respectively has a higher number of students by class, the record is considered an outlier and filtered out. 99
by default, i.e. no restriction is made.
Please notice that boundaries are included in the acceptance interval.
Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, wide.
If focus is on class size, the lower limit of the acceptable school-level (if filter_by_grade == FALSE
) or grade_level (otherwise) average of the number of students by class.
If a whole school or any grade in a school respectively has a smaller number of students by class, the record is considered an outlier and filtered out. 1
by default.
Please notice that boundaries are included in the acceptance interval.
Numeric. IF filter_by_grade == TRUE
, the upper limit of the acceptable grade-level average class size.
If NULL
it is set equal to UB_nstud_byclass
. NULL
by default.
Numeric. IF filter_by_grade == TRUE
, the lowrer limit of the acceptable grade-level average class size.
If NULL
it is set equal to LB_nstud_byclass
. NULL
by default.
Logical. If focus is on class size, whether to remove all school grades with average class size outside of the acceptance boundaries. FALSE
by default.
Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see Get_InnerAreas
). TRUE by default.
Logical. If check == TRUE
and InnerAreas == TRUE
, whether the Inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas
for the classification). FALSE
by default.
Logical. If nstud == TRUE
, whether to check the students number availability across all school included in the school registries (see Util_Check_nstud_availability
). TRUE
by default.
Character. If nstud == TRUE
and nstud_check == TRUE
, the school registries whose availability has to be checked. Either "Registry_from_buildings"
(buildings registry), "Registry_from_registry"
(proper registry), "Any"
or "Both"
. "Any"
by default.
Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). TRUE
by default.
Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year.
Logical. Either TRUE
, FALSE
or NULL
. If TRUE
, the values missing in a single dataset are automatically deleted from the final DB. If FALSE
, the missing observations are kept automatically. If NULL
, the choice is left to the user by an interactive menu. NULL
by default.
Object of class tbl_df
, tbl
and data.frame
.
If INVALSI == TRUE
, the raw Invalsi survey data, obtained as output of the Get_Invalsi_IS
function.
If NULL
, it will be downloaded automatically, but not saved in the global environment.
NULL
by default
Object of class tbl_df
, tbl
and data.frame
.
The school registry corresponding to the year in scope, obtained as output of the function Get_Registry
.
If NULL
, it will be downloaded automatically, but not saved in the global environment.
NULL
by default
Object of class tbl_df
, tbl
and data.frame
. If SchoolBuildings == TRUE
, the raw school buildings dataset obtained as output of the function Get_DB_MIUR
.
If NULL
, it will be downloaded automatically but not saved in the global environment. NULL
by default.
Object of class list
, including two objects of classtbl_df
, tbl
and data.frame
.
If nstud == TRUE
, the students and classes counts, obtained as output of the function Get_nstud
with default filename
parameter.
If NULL
, the function will download it automatically but it will not be saved in the global environment. NULL
by default.
Object of class list
with elements of class tbl_df
, tbl
and data.frame
If nstud == TRUE
, the mapping from school codes to municipality (and province) codes. Needed only if check == TRUE
, obtained as output of the function Get_School2mun
.
If NULL
, it will be downloaded automatically, but not saved in the global environment. NULL
by default.
Object of class tbl_df
, tbl
and data.frame
, obtained as output of the function Get_AdmUnNames
If necessary,the ISTAT file including all the codes and the names of the administrative units for the year in scope. Required either if nstud == TRUE & nstud_check == TRUE
, or if SchoolBuildings == TRUE
, input_DB_MIUR
is not provided, and the school year is one of 2015/16, 2017/18 or 1018/19
If NULL
, it will be downloaded automatically, but not saved in the global environment. NULL
by default.
Object of class tbl_df
, tbl
and data.frame
.
If InnerAreas == TRUE
, the classification of peripheral municipalities, obtained as output of the function Get_InnerAreas
If NULL
, it will be downloaded automatically, but not saved in the global environment.
NULL
by default
Object of class tbl_df
, tbl
and data.frame
. If nteachers == TRUE
and nstud = TRUE
, the number of teachers for studets by province. Please notice that
this object cannot be considered a substitute for the number of students by class since it provides no information on the number of schools in single educational grades but only at the school order level.
Obtained as output of the function Group_teachers4stud
.
If NULL
, it will be downloaded automatically, but not saved in the global environment.
NULL
by default.
Object of class tbl_df
, tbl
and data.frame
. If nteachers == TRUE
, the number of teachers by province, obtained as output of the function Get_nteachers_prov
. If NULL
, it will be downloaded automatically, but not saved in the global environment.
NULL
by default
Object of classs tbl_df
, tbl
and data.frame
. If BroadBand == TRUE, the raw Broadband connection dataset obtaned as output of the function Get_BroadBand
If NULL
, it will be downloaded automatically but not saved in the global environment. NULL
by default.
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE
by default.
Util_DB_MIUR_num
, Group_DB_MIUR
, Group_nstud
, Util_Check_nstud_availability
, Get_School2mun
for similar arguments.
DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
input_Invalsi_IS = example_Invalsi23_prov,
input_nstud = example_input_nstud23,
input_InnerAreas = example_InnerAreas,
input_School2mun = example_School2mun23,
input_AdmUnNames = example_AdmUnNames20220630)
DB23_prov
summary(DB23_prov[, -c(22:62)])
Run the code above in your browser using DataLab