bcbcsf_fitpred
trains models with Gibbs sampling for each number of retained features. The results are saved in files. This function also makes predictions for test cases if they are provided.bcbcsf_pred
uses the posterior samples saved by bcbcsf_fitpred
to predict the class labels of test cases. Prediction results are an array of predictive probabilities array_probs_pred
, whose rows for test cases, columns for classes, and the 3rd dimension for different numbers of retained features.
cross_vld
uses cross-validation to obtain predictive probabilities for all cases of a data set. This generic function can be used with bcbcsf_fitpred
and other classifiers.
bcbcsf_fitpred ( ## arguments specifying info of data sets X_tr, y_tr, nos_fsel = ncol (X_tr), X_ts = NULL, standardize = FALSE, rankf = FALSE, ## arguments for prediction burn = NULL, thin = 1, offset_sdxj = 0.5, ## arguments for Markov chain sampling no_rmc = 1000, no_imc = 5, no_mhwmux = 10, fit_bcbcsf_filepre = ".fitbcbcsf_", ## arguments specifying priors for parameters and hyerparameters w0_mu = 0.05, alpha0_mu = 0.5, alpha1_mu = 3, w0_x = 1.00, alpha0_x = 0.5, alpha1_x = 10, w0_nu = 0.05, alpha0_nu = 0.5, prior_psi = NULL, ## arguments for metropolis sampling for wmu, wx stepadj_mhwmux = 1, diag_mhwmux = FALSE, ## arguments for computing adjustment factor bcor = 1, cut_qf = exp (-10), cut_dpoi = exp (-10), nos_sim = 1000, ## whether look at progress monitor = TRUE)
bcbcsf_pred (X_ts, out_fit, burn = NULL, thin = 1, offset_sdxj = 0.5)
cross_vld (X, y, nfold = 10, folds = NULL, fitpred_func = bcbcsf_fitpred, ...)
X_tr
are training data, X_ts
are test data or future data for which prediction are needed, X
are a data set used for cross-validation.burn
of Markov chain (super)iterations will be discarded for prediction, and only every thin
th are used; by default, 20% of (super)iterations are burned, and thin
=1.offset_sdxj
% quantile of the samples of all standard deviations $\sqrt{w^x_j}$ is added to the all standard deviations; this is to remedy the non-normality in real gene expression data sets, and especially offset some very small standard deviations; by default, median is used.no_rmc
of super Markov chain transitions are run, with no_imc
Markov chain iterations for each; only the last state of each super transition is saved.fit_bcbcsf_filepre
is set to NULL, no fitting file will be created, and bcbcsf_fitpred
returns only the fitting result corresponding to the last number of retained features in nos_fsel
, which is always returned regardless of the value of fit_bcbcsf_filepre
.cut_qf
is $f_\ell$ in the reference, cut_dpoi
is the threshold below which Poisson probabilities are omitted, nos_sim
is the number of random $\Lambda$.folds
should be a list of test cases for different folds; if folds
is NULL (by default), folds
will be generated by the software, with nfold
is set to the smaller value of the given value and the smallest number of cases in all classes.bcbcsf_fitpred
, which are used to make prediction for test cases.fitpred_func
must include X_tr
, y_tr
, X_ts
, and the outputs of fitpred_func
must include array_probs_pred
fitpred_func
nos_fsel
, each saving file name of Markov chain fitting result for a number of retained features in nos_fsel
; the fitfiles
returned by cross_vld
is for the training in the last fold.nos_fsel
. Note that, the fitting results for other numbers (including the last one) of retained feature are saved in harddrive files if fit_bcbcsf_filepre
isn't empty, and can be retrieved using function reload_fit_bcbcsf
. Particularly, the list component of fit_bcbcsf
has fsel
saving the indice of features selected by F-statistic.