Prepare training data by mitigating confounding factors and standardizing values.
numero.prepare(data, variables = NULL, confounders = NULL, batch = NULL,
method = "standard", pipeline = NULL)
A matrix or a data frame.
A character vector of column names, see details.
Names of columns that contain confounder data.
The name of the column that contains batch labels.
Method to standardize values, see nroPreprocess()
.
Processing parameters from a previous use of the function.
A matrix with the attributes 'pipeline' that contains the processing parameters and 'subsets' that contains row names divided into batches if batch correction was applied.
We recommend first applying numero.clean()
to the full
dataset, then selecting a subset for training using the input argument
variables
. This preserves any attributes that may be used in
Numero functions.
If a previous pipeline
is available, it overrides all processing
parameters irrespective of other input arguments.
Due to safeguards against numerical instability, the standardized values may deviate slightly from the expected range (<0.1 percent error is typical).
# NOT RUN {
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)
# Set identities and manage missing data.
dataset <- numero.clean(dataset, identity = "INDEX")
# Prepare training variables using default standardization.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- numero.prepare(data = dataset, variables = trvars)
print(summary(trdata))
# Prepare training values adjusted for age and sex and
# standardized by rank-based method.
trdata <- numero.prepare(data = dataset, variables = trvars,
batch = "MALE", confounders = "AGE",
method = "tapered")
print(summary(trdata))
# }
Run the code above in your browser using DataLab