numero.prepare: Prepare datasets for analysis

Description

Prepare training data by mitigating confounding factors and standardizing values.

Usage

numero.prepare(data, variables = NULL, confounders = NULL, batch = NULL,
               method = "standard", pipeline = NULL)

Arguments

data

A matrix or a data frame.

variables

A character vector of column names, see details.

confounders

Names of columns that contain confounder data.

batch

The name of the column that contains batch labels.

method

Method to standardize values, see nroPreprocess().

pipeline

Processing parameters from a previous use of the function.

Value

A matrix with the attributes 'pipeline' that contains the processing parameters and 'subsets' that contains row names divided into batches if batch correction was applied.

Details

We recommend first applying numero.clean() to the full dataset, then selecting a subset for training using the input argument variables. This preserves any attributes that may be used in Numero functions.

If a previous pipeline is available, it overrides all processing parameters irrespective of other input arguments.

Due to safeguards against numerical instability, the standardized values may deviate slightly from the expected range (<0.1 percent error is typical).

Examples

Run this code

# NOT RUN {
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Set identities and manage missing data.
dataset <- numero.clean(dataset, identity = "INDEX")

# Prepare training variables using default standardization.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- numero.prepare(data = dataset, variables = trvars)
print(summary(trdata))

# Prepare training values adjusted for age and sex and
# standardized by rank-based method.
trdata <- numero.prepare(data = dataset, variables = trvars,
                         batch = "MALE", confounders = "AGE",
			 method = "tapered")
print(summary(trdata))
# }

Run the code above in your browser using DataLab