Learn R Programming

SchoolDataIT (version 0.2.8)

Util_DB_MIUR_num: Convert the raw school buildings data to numeric or Boolean variables

Description

This function transforms the output variables of the Get_DB_MIUR into Boolean or Numeric. Additionally, it removes the columns with an excessive number of missing observations (20.000 by default), and if required it may also delete the rows including missing fields. In this case, it is possible to keep track of the deleted rows.

Usage

Util_DB_MIUR_num(
  data = NULL,
  include_numerics = TRUE,
  include_qualitatives = FALSE,
  row_cutout = FALSE,
  track_deleted = TRUE,
  verbose = TRUE,
  col_cut_thresh = 20000,
  unique_buildings = TRUE,
  flag_outliers = TRUE,
  autoAbort = FALSE,
  ...
)

Value

If track_deleted == TRUE, An object of class list including two objects:

  • $data: object of class tbl_df, tbl and data.frame, the output dataframe.

  • $deleted: object of class tbl_df, tbl and data.frame. The school IDs of the deleted units.

If track_deleted == FALSE, the output is only the first element of the list.

Arguments

data

Object of class tbl_df, tbl and data.frame. Input data obtained through the function Get_DB_MIUR. If NULL it will be downloaded automatically with the appropriate arguments, but not saved in the global environment. NULL by default.

include_numerics

Logical. Whether to include strictly numeric variables alongside with Boolean ones. TRUE by default.

include_qualitatives

Logical. Whether to include qualitative variables alongside with Boolean ones. FALSE by default.

row_cutout

Logical. Whether to filter out rows including missing fields. FALSE by default.

track_deleted

Logical. If TRUE, the function returns the names of the schools not included in the output dataframe. TRUE by default.

verbose

Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.

col_cut_thresh

Numeric. The threshold of missing values allowed for each variable. If a variable as a higher number of missing observations, then it is cut out. 20.000 by default. Warning: if the option row_cutout is active, please select a lower threshold (e.g. 1000)

unique_buildings

Logical. Whether to remove records in which the building code is duplicated and all other fields are as well. As rows are combinations of building ID and school ID, if a school is hosted by several buildings, and each field other than School_code are duplicated, then only one row is retained. TRUE by default.

flag_outliers

Logical. Whether to assign NA to outliers in numeric variables. TRUE by default.

autoAbort

Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.

...

Additional arguments to the function Get_DB_MIUR if data is not provided.

Details

The outliers to be set to NA if flag_outliers is active are defined as follows: School area or free area surface of less than 50 squared meters, building volume of less than 150 cubic meters, 0 floors in the building.

Examples

Run this code

library(magrittr)

DB23_MIUR_num <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(track_deleted = FALSE)


DB23_MIUR_num[, -c(1,4,6,8,9,10)]
summary(DB23_MIUR_num)


Run the code above in your browser using DataLab