Preprocess data so they can be used as input for train_frm().
preprocess_data(
data,
degree_polynomial = 1,
interaction_terms = FALSE,
verbose = 1,
nw = 1,
rm_near_zero_var = TRUE,
rm_na = TRUE,
add_cds = TRUE,
rm_ucs = TRUE,
rt_terms = 1,
mandatory = c("NAME", "RT", "SMILES")
)A dataframe with the preprocessed data.
Dataframe with following columns:
Mandatory: NAME, RT and SMILES.
Recommmended: INCHIKEY.
Optional: Any of the chemical descriptors listed in CDFeatures. All other columns will be removed. See 'Details'.
Add predictors with polynomial terms up to the specified degree, e.g. 2 means "add squares", 3 means "add squares and cubes". Set to 1 to leave descriptors unchanged.
Add interaction terms? Polynomial terms are not included in the generation of interaction terms.
0: no output, 1: show progress, 2: progress and warnings.
Number of workers to use for parallel processing.
Remove near zero variance predictors?
Remove NA values?
Add chemical descriptors using getCDs()? See 'Details'.
Remove unsupported columns?
Which retention-time transformations to append as extra predictors. Supply a
numeric vector referencing predefined rt_terms (1=RT, 2=I(RT^2),
3=I(RT^3), 4=log(RT), 5=exp(RT), 6=sqrt(RT)) or a character vector with the
explicit transformation terms. Character values are passed to model.frame(),
so they must use valid formula syntax (e.g. "I(RT^2)" rather than "RT^2").
Character vector of mandatory columns that must be present in data. If any
of these columns are missing, an error is raised.
If add_cds = TRUE, chemical descriptors are added using getCDs(). If
all chemical descriptors listed in CDFeatures are already present in
the input data object, getCDs() will leave them unchanged. If one or more
chemical descriptors are missing, all chemical descriptors will be
recalculated and existing ones will be overwritten.
data <- head(RP, 3)
pre <- preprocess_data(data, verbose = 0)
Run the code above in your browser using DataLab