- pred
Vector of predicted values
- pred_class
A vector of class identifiers for the predicted values. This is used to group the predictions by class for Mondrian conformal prediction.
- calib
A numeric vector of predicted values in the calibration partition, or a 2 column tibble or matrix with the first column being the predicted values and the second column being the truth values. If calib is a numeric vector, calib_truth must be provided.
- calib_truth
A numeric vector of true values in the calibration partition. Only required if calib is a numeric vector
- calib_class
A vector of class identifiers for the calibration set.
- lower_bound
Optional minimum value for the prediction intervals. If not provided, the minimum (true) value of the calibration partition will be used. Primarily useful when the possible outcome values are outside the range of values observed in the calibration set. If not provided, the minimum (true) value of the calibration partition will be used.
- upper_bound
Optional maximum value for the prediction intervals. If not provided, the maximum (true) value of the calibration partition will be used. Primarily useful when the possible outcome values are outside the range of values observed in the calibration set. If not provided, the maximum (true) value of the calibration partition will be used.
- alpha
The confidence level for the prediction intervals. Must be a single numeric value between 0 and 1
- ncs_type
A string specifying the type of nonconformity score to use. Available options are:
"absolute_error": \(|y - \hat{y}|\)
"relative_error": \(|y - \hat{y}| / \hat{y}\)
"zero_adjusted_relative_error": \(|y - \hat{y}| / (\hat{y} + 1)\)
"heterogeneous_error": \(|y - \hat{y}| / \sigma_{\hat{y}}\) absolute error divided by a measure of heteroskedasticity, computed as the predicted value from a linear model of the absolute error on the predicted values
"raw_error": the signed error \(y - \hat{y}\)
The default is "absolute_error".
- grid_size
The number of points to use in the grid search between the lower and upper bound. Default is 10,000. A larger grid size increases the resolution of the prediction intervals but also increases computation time.
- resolution
Alternatively to grid_size. The minimum step size between grid points. Useful if the a specific resolution is desired. Default is NULL.
- n_clusters
Number of clusters to use when combining Mondrian classes. Required if optimize_n_clusters = FALSE.
- cluster_method
Clustering method used to group Mondrian classes. Options are "kmeans" or "ks" (Kolmogorov-Smirnov). Default is "kmeans".
- cluster_train_fraction
Fraction of the calibration data used to estimate nonconformity scores and compute clustering. Default is 1 (use all).
- optimize_n_clusters
Logical. If TRUE, the number of clusters is chosen automatically based on internal clustering criteria.
- optimize_n_clusters_method
Method used for cluster optimization. One of "calinhara" (Calinski-Harabasz index) or "min_cluster_size". Default is "calinhara".
- min_cluster_size
Minimum number of calibration points per cluster. Used only when optimize_n_clusters_method = "min_cluster_size".
- min_n_clusters
Minimum number of clusters to consider when optimizing.
- max_n_clusters
Maximum number of clusters to consider. If NULL, the upper limit is set to the number of unique Mondrian classes minus 1.
- distance_weighted_cp
Logical. If TRUE, weighted conformal prediction is performed where the non-conformity scores are weighted based on the distance between calibration and prediction points in feature space. Default is FALSE. See details for more information.
- distance_features_calib
A matrix, data frame, or numeric vector of features from which to compute distances when distance_weighted_cp = TRUE. This should contain the feature values for the calibration set. Must have the same number of rows as the calibration set. Can be the predicted values themselves, or any other features which give a meaningful distance measure.
- distance_features_pred
A matrix, data frame, or numeric vector of feature values for the prediction set. Must be the same features as specified in distance_features_calib. Required if distance_weighted_cp = TRUE.
- distance_type
The type of distance metric to use when computing distances between calibration and prediction points. Options are 'mahalanobis' (default) and 'euclidean'.
- normalize_distance
Either 'minmax', 'sd', or 'none'. Indicates if and how to normalize the distances when distance_weighted_cp is TRUE. Normalization helps ensure that distances are on a comparable scale across features. Default is 'none'.
- weight_function
A character string specifying the weighting kernel to use for distance-weighted conformal prediction. Options are:
"gaussian_kernel": \( w(d) = e^{-d^2} \)
"caucy_kernel": \( w(d) = 1/(1 + d^2) \)
"logistic": \( w(d) = 1//(1 + e^{d}) \)
"reciprocal_linear": \( w(d) = 1/(1 + d) \)
The default is "gaussian_kernel". Distances are computed as the Euclidean distance between the calibration and prediction feature vectors.