This function transforms the output variables of the Get_DB_MIUR
into Boolean or Numeric.
Additionally, it removes the columns with an excessive number of missing observations (20.000 by default), and if required it may also delete the rows including missing fields.
In this case, it is possible to keep track of the deleted rows.
Util_DB_MIUR_num(
data = NULL,
include_numerics = TRUE,
include_qualitatives = FALSE,
row_cutout = FALSE,
track_deleted = TRUE,
verbose = TRUE,
col_cut_thresh = 20000,
flag_outliers = TRUE,
autoAbort = FALSE,
...
)
If track_deleted == TRUE
, An object of class list
including two objects:
$data
: object of class tbl_df
, tbl
and data.frame
, the output dataframe.
$deleted
: object of class tbl_df
, tbl
and data.frame
. The school IDs of the deleted units.
If track_deleted == FALSE
, the output is only the first element of the list.
Object of class tbl_df
, tbl
and data.frame
. Input data obtained through the function Get_DB_MIUR
.
If NULL
it will be downloaded automatically with the appropriate arguments, but not saved in the global environment. NULL
by default.
Logical. Whether to include strictly numeric variables alongside with Boolean ones. TRUE
by default.
Logical. Whether to include qualitative variables alongside with Boolean ones. FALSE
by default.
Logical. Whether to filter out rows including missing fields. FALSE
by default.
Logical. If TRUE
, the function returns the names of the school not included in the output dataframe. TRUE
by default.
Logical. If TRUE
, the user keeps track of the main underlying operations. TRUE by default.
Numeric. The threshold of missing values allowed for each variable.
If a variable as a higher number of missing observations, then it is cut out. 20.000
by default.
Warning: if the option row_cutout
is active, please select a lower threshold (e.g. 1000
)
Logical. Whether to assign NA to outliers in numeric variables. TRUE
by default.
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE
by default.
Additional arguments to the function Get_DB_MIUR
if data
is not provided.
The outliers to be set to NA
if flag_outliers
is active are defined as follows: School area or free area surface of less than 50 squared meters,
building volume of less than 150 cubic meters, 0 floors in the building.
library(magrittr)
DB23_MIUR_num <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(track_deleted = FALSE)
DB23_MIUR_num[, -c(1,4,6,8,9,10)]
summary(DB23_MIUR_num)
Run the code above in your browser using DataLab