This is the function internally used in insilico.train
function.
extract.prob(
train,
gs,
gstable,
thre = 0.95,
type = c("quantile", "fixed", "empirical")[1],
isNumeric = FALSE,
impute = TRUE
)
raw P(S|C) matrix
ranked P(S|C) matrix
list of ranks used
list of median numerical values for each rank
training data after removing symptoms with too high missing rate.
Training data, it should be in the same format as the testing data
and contains one additional column (see cause
below) specifying known
cause of death. The first column is also assumed to be death ID.
the name of the column in train
that contains cause of death.
The list of causes of death used in training data.
a numerical value between 0 to 1. It specifies the maximum rate of missing for any symptoms to be considered in the model. Default value is set to 0.95, meaning if a symptom has more than 95% missing in the training data, it will be removed.
Three types of learning conditional probabilities are provided: ``quantile'' or ``fixed''. Since InSilicoVA works with ranked conditional probabilities P(S|C), ``quantile'' means the rankings of the P(S|C) are obtained by matching the same quantile distributions in the default InterVA P(S|C), and ``fixed'' means P(S|C) are matched to the closest values in the default InterVA P(S|C) table. Empirically both types of rankings produce similar results. The third option ``empirical'' means no rankings are calculated, only the raw P(S|C) values are returned.
Indicator if the input is already in numeric form. If the input is coded numerically such that 1 for ``present'', 0 for ``absent'', and -1 for ``missing'', this indicator could be set to True to avoid conversion to standard InterVA format.
Indicator for whether to impute 1. P(S|C) with P(S) if symptom S does not exist more than the threshold of fractions within death due to C; and 2. values of exact 0 or 1.