Evaluate Predictions for the Case Study Handed in By Students
evaluate_casestudy(prediction_files, solution_file)
a tibble with one row for each file given in prediction_files
and the
following columns:
the rank of the prediction among all predictions in the tibble
The tibble is sorted according to rank and ranking occurs first by
balanced_accuracy
and then accuracy
.
the name of the file that contained the prediction.
the number of valid predictions in the file.
the mean of sensitivity and specificity.
accuracy of the prediction.
sensitivity, i.e., the rate of correct predictions for
the "positive" class "<=50K"
.
specificity, i.e., the rate of correct predictions for
the "negative" class ">50K"
.
character of file paths of csv files with model predictions.
path to the parquet file containing the correct solutions.
The prediction files must be csv-files (comma separated) with two columns:
a five-digit integer giving the ID of the person.
the predicted income class, one of "<=50K"
and ">50K"
.
Missing IDs and any class that is not one of the accepted values count as failed predictions. The performance metrics are always computed on the full data set, not just on the available predictions.