Learn R Programming

SchoolDataIT (version 0.2.8)

Set_DB: Build up a comprehensive database regarding the school system

Description

This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:

  • Invalsi census survey

  • School buildings

  • Number of students and school classes

  • Number of teachers

  • Broadband connection availability

To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.

Usage

Set_DB(
  Year = 2023,
  level = "LAU",
  conservative = TRUE,
  Invalsi = TRUE,
  SchoolBuildings = TRUE,
  nstud = TRUE,
  nteachers = TRUE,
  BroadBand = TRUE,
  verbose = TRUE,
  show_col_types = FALSE,
  Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
  Invalsi_grade = c(2, 5, 8, 10, 13),
  Invalsi_WLE = FALSE,
  SchoolBuildings_certifications = FALSE,
  SchoolBuildings_include_numerics = TRUE,
  SchoolBuildings_include_qualitatives = FALSE,
  SchoolBuildings_row_cutout = FALSE,
  SchoolBuildings_unique_buildings = TRUE,
  SchoolBuildings_col_cut_thresh = 20000,
  SchoolBuildings_flag_outliers = TRUE,
  SchoolBuildings_count_missing = FALSE,
  nstud_imputation_thresh = 19,
  nstud_missing_to_1 = FALSE,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  UB_nstud_byclass_grade = NULL,
  LB_nstud_byclass_grade = NULL,
  nstud_filter_by_grade = FALSE,
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  nstud_check = TRUE,
  nstud_check_registry = "Any",
  BroadBand_impute_missing = TRUE,
  Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
  NA_autoRM = NULL,
  input_Invalsi_IS = NULL,
  input_Registry = NULL,
  input_SchoolBuildings = NULL,
  input_nstud = NULL,
  input_School2mun = NULL,
  input_AdmUnNames = NULL,
  input_InnerAreas = NULL,
  input_teachers4student = NULL,
  input_nteachers = NULL,
  input_BroadBand = NULL,
  autoAbort = FALSE
)

Value

An object of class tbl_df, tbl and data.frame

Arguments

Year

Numeric or Character. The relevant school year. Available in the formats: 2023, "2022/2023", 202223, 20222023. Important: if input datasets are plugged in, please select the same Year argument used to download the input data. 2023 by default.

level

Character. The administrative level of detail at which data must be aggregated. Either "LAU"/"Municipality" or "NUTS-3"/"Province". "LAU" by default.

conservative

Logical. If FALSE, only the schools included in all the datasets are taken as input. TRUE by default.

Invalsi

Logical. Whether the Invalsi census data must be included (see Get_Invalsi_IS. TRUE by default.

SchoolBuildings

Logical. Whether the school buildings dataset must be included (see link{Get_DB_MIUR}, Util_DB_MIUR_num. TRUE by default.

nstud

Logical. Whether the students number per class must be included (see Get_nstud. TRUE by default.

nteachers

Logical. Whether the number of teachers by province must be included (see link{Get_nteachers_prov}). TRUE by default.

BroadBand

Logical. Whether the broadband availability in schools must be included (see Get_BroadBand). TRUE by default

verbose

Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.

show_col_types

Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.

Invalsi_subj

Character. If Invalsi == TRUE, the school subject(s) to include, among "Englis_listening"/"ELI", "English_reading"/"ERE", "Italian"/"Ita" and "Mathematics"/"MAT". All four by default.

Invalsi_grade

Numeric. If Invalsi == TRUE, the educational grade to choose. Either 2 (2nd year of primary school), 5 (last year of primary school), 8 (last year of middle shcool), 10 (2nd year of high school) or 13 (last year of school). All by default.

Invalsi_WLE

Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or 2 5. FALSE by default

SchoolBuildings_certifications

Logical. If the school buldings database has to be downloaded, whether to include safety certifications. Only relevant from schol year 2020/21 onwards (see Get_DB_MIUR). FALSE by default

SchoolBuildings_include_numerics

Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). TRUE by default.

SchoolBuildings_include_qualitatives

Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). FALSE by default.

SchoolBuildings_row_cutout

Logical. Whether to filter out rows including missing fields in the school buildings database (see Util_DB_MIUR_num). FALSE by default.

SchoolBuildings_unique_buildings

Logical. If school buildings DB is included at the building level, whether to remove records in which the building code is duplicated and all other fields are as well. As rows are combinations of building ID and school ID, if a school is hosted by several buildings, and each field other than School_code are duplicated, then only one row is retained. TRUE by default. See Util_DB_MIUR_num.

SchoolBuildings_col_cut_thresh

Numeric. The threshold of missing values allowed for each variable in the school buildings database (see Util_DB_MIUR_num). If a variable as a higher number of missing observations, then it is cut out. 20.000 by default. Warning: if the option SchoolBuildings_row_cutout is active, please select a lower threshold (e.g. 1000)

SchoolBuildings_flag_outliers

Logical. Whether to assign NA to outliers in numeric variables; see Util_DB_MIUR_num for more details. TRUE by default.

SchoolBuildings_count_missing

Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also Group_DB_MIUR). FALSE by default.

nstud_imputation_thresh

Numeric. If nstud_missing_to_1 == TRUE, the minimum threshold below which the number of classes is imputed to 1 if missing; see also Util_nstud_wide. 19 by default.

nstud_missing_to_1

Numeric. If nstud == TRUE, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh, see Util_nstud_wide). FALSE by default.

UB_nstud_byclass

Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, high. If focus is on class size, the upper limit of the acceptable school-level (if filter_by_grade == FALSE) or grade-level (otherwise) average of the number of students by class. If a whole school or any grade in a school respectively has a higher number of students by class, the record is considered an outlier and filtered out. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.

LB_nstud_byclass

Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, wide. If focus is on class size, the lower limit of the acceptable school-level (if filter_by_grade == FALSE) or grade_level (otherwise) average of the number of students by class. If a whole school or any grade in a school respectively has a smaller number of students by class, the record is considered an outlier and filtered out. 1 by default. Please notice that boundaries are included in the acceptance interval.

UB_nstud_byclass_grade

Numeric. IF filter_by_grade == TRUE, the upper limit of the acceptable grade-level average class size. If NULL it is set equal to UB_nstud_byclass. NULL by default.

LB_nstud_byclass_grade

Numeric. IF filter_by_grade == TRUE, the lowrer limit of the acceptable grade-level average class size. If NULL it is set equal to LB_nstud_byclass. NULL by default.

nstud_filter_by_grade

Logical. If focus is on class size, whether to remove all school grades with average class size outside of the acceptance boundaries. FALSE by default.

InnerAreas

Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see Get_InnerAreas). TRUE by default.

ord_InnerAreas

Logical. If check == TRUE and InnerAreas == TRUE, whether the Inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas for the classification). FALSE by default.

nstud_check

Logical. If nstud == TRUE, whether to check the students number availability across all school included in the school registries (see Util_Check_nstud_availability). TRUE by default.

nstud_check_registry

Character. If nstud == TRUE and nstud_check == TRUE, the school registries whose availability has to be checked. Either "Registry_from_buildings" (buildings registry), "Registry_from_registry" (proper registry), "Any" or "Both". "Any" by default.

BroadBand_impute_missing

Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). TRUE by default.

Date

Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year.

NA_autoRM

Logical. Either TRUE, FALSE or NULL. If TRUE, the values missing in a single dataset are automatically deleted from the final DB. If FALSE, the missing observations are kept automatically. If NULL, the choice is left to the user by an interactive menu. NULL by default.

input_Invalsi_IS

Object of class tbl_df, tbl and data.frame. If INVALSI == TRUE, the raw Invalsi survey data, obtained as output of the Get_Invalsi_IS function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default

input_Registry

Object of class tbl_df, tbl and data.frame. The school registry corresponding to the year in scope, obtained as output of the function Get_Registry. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default

input_SchoolBuildings

Object of class tbl_df, tbl and data.frame. If SchoolBuildings == TRUE, the raw school buildings dataset obtained as output of the function Get_DB_MIUR. If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.

input_nstud

Object of class list, including two objects of classtbl_df, tbl and data.frame. If nstud == TRUE, the students and classes counts, obtained as output of the function Get_nstud with default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.

input_School2mun

Object of class list with elements of class tbl_df, tbl and data.frame If nstud == TRUE, the mapping from school codes to municipality (and province) codes. Needed only if check == TRUE, obtained as output of the function Get_School2mun. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.

input_AdmUnNames

Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames If necessary,the ISTAT file including all the codes and the names of the administrative units for the year in scope. Required either if nstud == TRUE & nstud_check == TRUE, or if SchoolBuildings == TRUE, input_DB_MIUR is not provided, and the school year is one of 2015/16, 2017/18 or 1018/19 If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.

input_InnerAreas

Object of class tbl_df, tbl and data.frame. If InnerAreas == TRUE, the classification of peripheral municipalities, obtained as output of the function Get_InnerAreas If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default

input_teachers4student

Object of class tbl_df, tbl and data.frame. If nteachers == TRUE and nstud = TRUE, the number of teachers for studets by province. Please notice that this object cannot be considered a substitute for the number of students by class since it provides no information on the number of schools in single educational grades but only at the school order level. Obtained as output of the function Group_teachers4stud. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.

input_nteachers

Object of class tbl_df, tbl and data.frame. If nteachers == TRUE, the number of teachers by province, obtained as output of the function Get_nteachers_prov. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default

input_BroadBand

Object of classs tbl_df, tbl and data.frame. If BroadBand == TRUE, the raw Broadband connection dataset obtaned as output of the function Get_BroadBand If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.

autoAbort

Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.

See Also

Util_DB_MIUR_num, Group_DB_MIUR, Group_nstud, Util_Check_nstud_availability, Get_School2mun for similar arguments.

Examples

Run this code



DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
      Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
      SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
      input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
      input_Invalsi_IS = example_Invalsi23_prov,
      input_nstud = example_input_nstud23,
      input_InnerAreas = example_InnerAreas,
      input_School2mun = example_School2mun23,
      input_AdmUnNames = example_AdmUnNames20220630)


DB23_prov

summary(DB23_prov[, -c(22:62)])





Run the code above in your browser using DataLab