Learn R Programming

SchoolDataIT (version 0.2.4)

Util_nstud_wide: Clean the raw dataframe of the number of students and arrange it in a wide format

Description

This function rearranges the output of the Get_nstud function in such a way to represent the counts of students and, if required, either the number of students by class and number of classes, or the counts of students per school timetable (running time) in a unique observation per school. If the focus is on class size, this function firstly cleans the data from the outliers in terms of average number of students by class at the school level and imputates the number of classes to 1 when missing.

Usage

Util_nstud_wide(
  data = NULL,
  missing_to_1 = FALSE,
  nstud_imputation_thresh = 19,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  filter_by_grade = FALSE,
  UB_nstud_byclass_grade = NULL,
  LB_nstud_byclass_grade = NULL,
  verbose = TRUE,
  autoAbort = FALSE,
  ...
)

Value

An object of class tbl_df, tbl and data.frame

Arguments

data

Object of class list, including two objects of class tbl_df, tbl and data.frame, obtainded as output of the Get_nstud function with the default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.

missing_to_1

Logical. If focus is on class size, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh). TRUE by default.

nstud_imputation_thresh

Numeric. If focus is on class size, the minimum threshold below which the number of classes is imputed to 1 if missing, if missing_to_1 == TRUE. E.g. if the threshold is 19, for all the schools in which there are 19 or less students in a given grade but the number of classes for that grade is missing, the number of classes is imputated to 1. 19 by default.

UB_nstud_byclass

Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, high. If focus is on class size, the upper limit of the acceptable school-level (if filter_by_grade == FALSE) or grade-level (otherwise) average of the number of students by class. If a whole school or any grade in a school respectively has a higher number of students by class, the record is considered an outlier and filtered out. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.

LB_nstud_byclass

Numeric. Either a unique value for all school orders, or a vector of three order-specific values in the order: primary, middle, wide. If focus is on class size, the lower limit of the acceptable school-level (if filter_by_grade == FALSE) or grade_level (otherwise) average of the number of students by class. If a whole school or any grade in a school respectively has a smaller number of students by class, the record is considered an outlier and filtered out. 1 by default. Please notice that boundaries are included in the acceptance interval.

filter_by_grade

Logical. If focus is on class size, whether to remove all school grades with average class size outside of the acceptance boundaries. FALSE by default.

UB_nstud_byclass_grade

Numeric. IF filter_by_grade == TRUE, the upper limit of the acceptable grade-level average class size. If NULL it is set equal to UB_nstud_byclass. NULL by default.

LB_nstud_byclass_grade

Numeric. IF filter_by_grade == TRUE, the lowrer limit of the acceptable grade-level average class size. If NULL it is set equal to LB_nstud_byclass. NULL by default.

verbose

Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.

autoAbort

Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.

...

Arguments to Get_nstud, needed if data is not provided.

Details

In the example, we compare the dataframe obtained with the default settings and the one imposed setting narrow inclusion criteria

Examples

Run this code


nstud.default <- Util_nstud_wide(example_input_nstud23)


nstud.narrow <- Util_nstud_wide(example_input_nstud23,
  UB_nstud_byclass = 35, LB_nstud_byclass = 5 )

nrow(nstud.default)
nrow(nstud.narrow)

nstud.default

summary(nstud.default)


Run the code above in your browser using DataLab