# preProcess

##### Pre-Processing of Predictors

Pre-processing transformation (centering, scaling etc) can be estimated from the training data and applied to any data set with the same variables.

- Keywords
- utilities

##### Usage

`preProcess(x, ...)`## S3 method for class 'default':
preProcess(x,
method = c("center", "scale"),
thresh = 0.95,
na.remove = TRUE,
k = 5,
knnSummary = mean,
...)

## S3 method for class 'preProcess':
predict(object, newdata, ...)

##### Arguments

- x
- a matrix or data frame. All variables must be numeric.
- method
- a character vector specifying the type of processing. Possible values are "center", "scale", "knnImpute", "pca" "ica" and "spatialSign" (see Details below)
- thresh
- a cutoff for the cumulative percent of variance to be retained by PCA
- na.remove
- a logical; should missing values be removed from the calculations?
- object
- an object of class
`preProcess`

- newdata
- a matrix or data frame of new data to be pre-processed
- k
- the number of nearest neighbors from the training set to use for imputation
- knnSummary
- function to average the neighbor values per column during imputation
- ...
- additional arguments to pass to
`fastICA`

, such as`n.comp`

##### Details

The operations are applied in this order: imputation, centering, scaling, PCA, ICA then spatial sign.

If PCA is requested but scaling is not, the values will still be scaled. Similarly, when ICA is requested, the data are automatically centered.

$k$-nearest neighbor imputation is carried out by finding the k closest samples (Euclidian distance) in the training set.

A warning is thrown if both PCA and ICA are requested. ICA, as implemented bt the `fastICA`

package automatically does a PCA decomposition prior to finding the ICA scores.

The function will throw an error of any variables in `x`

has less than two unique values.

##### Value

`preProcess`

results in a list with elementscall the function call dim the dimensions of `x`

mean a vector of means (if centering was requested) std a vector of standard deviations (if scaling or PCA was requested) rotation a matrix of eigenvectors if PCA was requested method the value of `method`

thresh the value of `thresh`

numComp the number of principal components required of capture the specified amount of variance ica contains values for the `W`

and`K`

matrix of the decomposition

##### References

Kuhn (2008), ``Building Predictive Models in R Using the caret'' (

##### See Also

##### Examples

```
data(BloodBrain)
# one variable has one unique value
preProc <- preProcess(bbbDescr[1:100,])
preProc <- preProcess(bbbDescr[1:100,-3])
training <- predict(preProc, bbbDescr[1:100,-3])
test <- predict(preProc, bbbDescr[101:208,-3])
```

*Documentation reproduced from package caret, version 4.65, License: GPL-2*