# darch

##### Fit a deep neural network

Fit a deep neural network with optional pre-training and one of various fine-tuning algorithms.

##### Usage

`darch(x, ...)`# S3 method for default
darch(x, y, layers = 10, ..., autosave = F,
autosave.epochs = round(darch.numEpochs/20),
autosave.dir = "./darch.autosave", autosave.trim = F, bp.learnRate = 1,
bp.learnRateScale = 1, bootstrap = F, bootstrap.unique = T,
bootstrap.num = 0, cg.length = 2, cg.switchLayers = 1, darch = NULL,
darch.batchSize = 1, darch.dither = F, darch.dropout = 0,
darch.dropout.dropConnect = F, darch.dropout.momentMatching = 0,
darch.dropout.oneMaskPerEpoch = F, darch.elu.alpha = 1,
darch.errorFunction = if (darch.isClass) crossEntropyError else mseError,
darch.finalMomentum = 0.9, darch.fineTuneFunction = backpropagation,
darch.initialMomentum = 0.5, darch.isClass = T,
darch.maxout.poolSize = 2, darch.maxout.unitFunction = linearUnit,
darch.momentumRampLength = 1, darch.nesterovMomentum = T,
darch.numEpochs = 100, darch.returnBestModel = T,
darch.returnBestModel.validationErrorFactor = 1 - exp(-1),
darch.stopClassErr = -Inf, darch.stopErr = -Inf,
darch.stopValidClassErr = -Inf, darch.stopValidErr = -Inf,
darch.trainLayers = T, darch.unitFunction = sigmoidUnit,
darch.weightDecay = 0,
darch.weightUpdateFunction = weightDecayWeightUpdate, dataSet = NULL,
dataSetValid = NULL,
generateWeightsFunction = generateWeightsGlorotUniform, gputools = F,
gputools.deviceId = 0, logLevel = NULL, normalizeWeights = F,
normalizeWeightsBound = 15, paramsList = list(),
preProc.factorToNumeric = F, preProc.factorToNumeric.targets = F,
preProc.fullRank = T, preProc.fullRank.targets = F,
preProc.orderedToFactor.targets = T, preProc.params = F,
preProc.targets = F, rbm.allData = F, rbm.batchSize = 1,
rbm.consecutive = T, rbm.errorFunction = mseError,
rbm.finalMomentum = 0.9, rbm.initialMomentum = 0.5, rbm.lastLayer = 0,
rbm.learnRate = 1, rbm.learnRateScale = 1, rbm.momentumRampLength = 1,
rbm.numCD = 1, rbm.numEpochs = 0, rbm.unitFunction = sigmoidUnitRbm,
rbm.updateFunction = rbmUpdate, rbm.weightDecay = 2e-04, retainData = F,
rprop.decFact = 0.5, rprop.incFact = 1.2, rprop.initDelta = 1/80,
rprop.maxDelta = 50, rprop.method = "iRprop+", rprop.minDelta = 1e-06,
seed = NULL, shuffleTrainData = T, weights.max = 0.1,
weights.mean = 0, weights.min = -0.1, weights.sd = 0.01,
xValid = NULL, yValid = NULL)

# S3 method for formula
darch(x, data, layers, ..., xValid = NULL, dataSet = NULL,
dataSetValid = NULL, logLevel = NULL, paramsList = list(),
darch = NULL)

# S3 method for DataSet
darch(x, ...)

##### Arguments

- x
Input data matrix or

`data.frame`

(`darch.default`

) or`formula`

(`darch.formula`

) or`'>DataSet`

(`darch.DataSet`

).- ...
Additional parameters.

- y
Target data matrix or

`data.frame`

, if`x`

is an input data matrix or`data.frame`

.- layers
Vector containing one integer for the number of neurons of each layer. Defaults to c(

`a`

, 10,`b`

), where`a`

is the number of columns in the training data and`b`

the number of columns in the targets. If this has length 1, it is used as the number of neurons in the hidden layer, not as the number of layers!- autosave
Logical indicating whether to activate automatically saving the

`'>DArch`

instance to a file during fine-tuning.- autosave.epochs
After how many epochs should auto-saving happen, by default after every 5 1, the network will only be saved once when thee fine-tuning is done.

- autosave.dir
Directory for the autosave files, the file names will be e.g. autosave_010.net for the DArch instance after 10 epochs

- autosave.trim
Whether to trim the network before saving it. This will remove the dataset and the layer weights, resulting in a network that is no longer usable for predictions or training. Useful when only statistics and settings need to be stored.

- bp.learnRate
Learning rates for backpropagation, length is either one or the same as the number of weight matrices when using different learning rates for each layer.

- bp.learnRateScale
The learn rate is multiplied by this value after each epoch.

- bootstrap
Logical indicating whether to use bootstrapping to create a training and validation data set from the given training data.

- bootstrap.unique
Logical indicating whether to take only unique samples for the training (

`TRUE`

, default) or take all drawn samples (`FALSE`

), which will results in a bigger training set with duplicates.**Note:**This is ignored if`bootstrap.num`

is greater than 0.- bootstrap.num
If this is greater than 0, bootstrapping will draw this number of training samples without replacement.

- cg.length
Numbers of line search

- cg.switchLayers
Indicates when to train the full network instead of only the upper two layers

- darch
Existing

`'>DArch`

instance for which training is to be resumed.**Note:**When enabling pre-training, previous training results we be lost, see explanation for parameter`rbm.numEpochs`

.- darch.batchSize
Batch size, i.e. the number of training samples that are presented to the network before weight updates are performed, for fine-tuning.

- darch.dither
Whether to apply dither to numeric columns in the training input data.

- darch.dropout
Dropout rates. If this is a vector it will be treated as the dropout rates for each individual layer. If one element is missing, the input dropout will be set to 0. When enabling

`darch.dropout.dropConnect`

, this vector needs an additional element (one element per weight matrix between two layers as opposed to one element per layer excluding the last layer).- darch.dropout.dropConnect
Whether to use DropConnect instead of dropout for the hidden layers. Will use

`darch.dropout`

as the dropout rates.- darch.dropout.momentMatching
How many iterations to perform during moment matching for dropout inference, 0 to disable moment matching.

- darch.dropout.oneMaskPerEpoch
Whether to generate a new mask for each batch (

`FALSE`

, default) or for each epoch (`TRUE`

).- darch.elu.alpha
Alpha parameter for the exponential linear unit function. See

`exponentialLinearUnit`

.- darch.errorFunction
Error function during fine-tuning. Possible error functions include

`mseError`

,`rmseError`

, and`crossEntropyError`

.- darch.finalMomentum
Final momentum during fine-tuning.

- darch.fineTuneFunction
Fine-tuning function. Possible values include

`backpropagation`

(default),`rpropagation`

,`minimizeClassifier`

and`minimizeAutoencoder`

(unsupervised).- darch.initialMomentum
Initial momentum during fine-tuning.

- darch.isClass
Whether output should be treated as class labels during fine-tuning and classification rates should be printed.

- darch.maxout.poolSize
Pool size for maxout units, when using the maxout acitvation function. See

`maxoutUnit`

.- darch.maxout.unitFunction
Inner unit function used by maxout. See

`darch.unitFunction`

for possible unit functions.- darch.momentumRampLength
After how many epochs, relative to the

**overall**number of epochs trained, should the momentum reach`darch.finalMomentum`

? A value of 1 indicates that the`darch.finalMomentum`

should be reached in the final epoch, a value of 0.5 indicates that`darch.finalMomentum`

should be reached after half of the training is complete. Note that this will lead to bumps in the momentum ramp if training is resumed with the same parameters for`darch.initialMomentum`

and`darch.finalMomentum`

. Set`darch.momentumRampLength`

to 0 to avoid this problem when resuming training.- darch.nesterovMomentum
Whether to use Nesterov Accelerated Momentum. (NAG) for gradient descent based fine-tuning algorithms.

- darch.numEpochs
Number of epochs of fine-tuning.

- darch.returnBestModel
Logical indicating whether to return the best model at the end of training, instead of the last.

- darch.returnBestModel.validationErrorFactor
When evaluating models with validation data, how high should the validation error be valued, compared to the training error? This is a value between 0 and 1. By default, this value is

`1 - exp(-1)`

. The training error factor and the validation error factor will always add to 1, so if you pass 1 here, the training error will be ignored, and if you pass 0 here, the validation error will be ignored.- darch.stopClassErr
When the classification error is lower than or equal to this value, training is stopped (0..100).

- darch.stopErr
When the value of the error function is lower than or equal to this value, training is stopped.

- darch.stopValidClassErr
When the classification error on the validation data is lower than or equal to this value, training is stopped (0..100).

- darch.stopValidErr
When the value of the error function on the validation data is lower than or equal to this value, training is stopped.

- darch.trainLayers
Either TRUE to train all layers or a mask containing TRUE for all layers which should be trained and FALSE for all layers that should not be trained (no entry for the input layer).

- darch.unitFunction
Layer function or vector of layer functions of length

`number of layers`

- 1. Note that the first entry signifies the layer function between layers 1 and 2, i.e. the output of layer 2. Layer 1 does not have a layer function, since the input values are used directly. Possible unit functions include`linearUnit`

,`sigmoidUnit`

,`tanhUnit`

,`rectifiedLinearUnit`

,`softplusUnit`

,`softmaxUnit`

, and`maxoutUnit`

.- darch.weightDecay
Weight decay factor, defaults to

`0`

. All weights will be multiplied by (1 -`darch.weightDecay`

) prior to each weight update.- darch.weightUpdateFunction
Weight update function or vector of weight update functions, very similar to

`darch.unitFunction`

. Possible weight update functions include`weightDecayWeightUpdate`

and`maxoutWeightUpdate`

Note that`maxoutWeightUpdate`

must be used on the layer**after**the maxout activation function!- dataSet
`'>DataSet`

instance, passed from darch.DataSet(), may be specified manually.- dataSetValid
- generateWeightsFunction
Weight generation function or vector of layer generation functions of length

`number of layers`

- 1. Possible weight generation functions include`generateWeightsUniform`

(default),`generateWeightsNormal`

,`generateWeightsGlorotNormal`

,`generateWeightsGlorotUniform`

,`generateWeightsHeNormal`

, and`generateWeightsHeUniform`

.- gputools
Logical indicating whether to use gputools for matrix multiplication, if available.

- gputools.deviceId
Integer specifying the device to use for GPU matrix multiplication. See

`chooseGpu`

.- logLevel
`futile.logger`

log level. Uses the currently set log level by default, which is`futile.logger::flog.info`

if it was not changed. Other available levels include, from least to most verbose:`FATAL`

,`ERROR`

,`WARN`

,`DEBUG`

, and`TRACE`

.- normalizeWeights
Logical indicating whether to normalize weights (L2 norm =

`normalizeWeightsBound`

).- normalizeWeightsBound
Upper bound on the L2 norm of incoming weight vectors. Used only if

`normalizeWeights`

is`TRUE`

.- paramsList
List of parameters, can include and does overwrite specified parameters listed above. Primary for convenience or for use in scripts.

- preProc.factorToNumeric
Whether all factors should be converted to numeric.

- preProc.factorToNumeric.targets
Whether all factors should be converted to numeric in the target data.

- preProc.fullRank
Whether to use full rank encoding. See preProcess for details.

- preProc.fullRank.targets
Whether to use full rank encoding for target data. See preProcess for details.

- preProc.orderedToFactor.targets
Whether ordered factors in the target data should be converted to unordered factors.

**Note:**Ordered factors are converted to numeric by`dummyVars`

and no longer usable for classification tasks.- preProc.params
List of parameters to pass to the

`preProcess`

function for the input data or`FALSE`

to disable input data pre-processing.- preProc.targets
Whether target data is to be centered and scaled. Unlike

`preProc.params`

, this is just a logical turning pre-processing for target data on or off, since this pre-processing has to be reverted when predicting new data. Most useful for regression tasks.**Note:**This will skew the raw network error.- rbm.allData
Logical indicating whether to use training and validation data for pre-training.

**Note:**This also applies when using bootstrapping.- rbm.batchSize
Pre-training batch size.

- rbm.consecutive
Logical indicating whether to train the RBMs one at a time for

`rbm.numEpochs`

epochs (`TRUE`

, default) or alternatingly training each RBM for one epoch at a time (`FALSE`

).- rbm.errorFunction
Error function during pre-training. This is only used to estimate the RBM error and does not affect the training itself. Possible error functions include

`mseError`

and`rmseError`

.- rbm.finalMomentum
Final momentum during pre-training.

- rbm.initialMomentum
Initial momentum during pre-training.

- rbm.lastLayer
`Numeric`

indicating at which layer to stop the pre-training. Possible values include`0`

, meaning that all layers are trained; positive integers, meaning to stop training after the RBM where`rbm.lastLayer`

forms the visible layer; negative integers, meaning to stop the training at`rbm.lastLayer`

RBMs from the top RBM.- rbm.learnRate
Learning rate during pre-training.

- rbm.learnRateScale
The learn rates will be multiplied with this value after each epoch.

- rbm.momentumRampLength
After how many epochs, relative to

`rbm.numEpochs`

, should the momentum reach`rbm.finalMomentum`

? A value of 1 indicates that the`rbm.finalMomentum`

should be reached in the final epoch, a value of 0.5 indicates that`rbm.finalMomentum`

should be reached after half of the training is complete.- rbm.numCD
Number of full steps for which contrastive divergence is performed. Increasing this will slow training down considerably.

- rbm.numEpochs
Number of pre-training epochs.

**Note:**When passing a value other than`0`

here and also passing an existing`'>DArch`

instance via the`darch`

parameter, the weights of the network will be completely reset! Pre-training is essentially a form of advanced weight initialization and it makes no sense to perform pre-training on a previously trained network.- rbm.unitFunction
Unit function during pre-training. Possible functions include

`sigmoidUnitRbm`

(default),`tanhUnitRbm`

, and`linearUnitRbm`

.- rbm.updateFunction
Update function during pre-training. Currently,

`darch`

only provides`rbmUpdate`

.- rbm.weightDecay
Pre-training weight decay. Weights will be multiplied by (1 -

`rbm.weightDecay`

) prior to each weight update.- retainData
Logical indicating whether to store the training data in the

`'>DArch`

instance after training or when saving it to disk.- rprop.decFact
Decreasing factor for the training. Default is

`0.6`

.- rprop.incFact
Increasing factor for the training Default is

`1.2`

.- rprop.initDelta
Initialisation value for the update. Default is

`0.0125`

.- rprop.maxDelta
Upper bound for step size. Default is

`50`

- rprop.method
The method for the training. Default is "iRprop+"

- rprop.minDelta
Lower bound for step size. Default is

`0.000001`

- seed
Allows the specification of a seed which will be set via

`set.seed`

. Used in the context of`darchBench`

.- shuffleTrainData
Logical indicating whether to shuffle training data before each epoch.

- weights.max
`max`

parameter to the runif function.- weights.mean
`mean`

parameter to the rnorm function.- weights.min
`min`

parameter to the runif function.- weights.sd
`sd`

parameter to the rnorm function.- xValid
Validation input data matrix or

`data.frame`

.- yValid
Validation target data matrix or

`data.frame`

, if`x`

is a data matrix or`data.frame`

.- data
`data.frame`

containing the dataset, if`x`

is a`formula`

.

##### Details

The darch package implements Deep Architecture Networks and restricted Boltzmann machines.

The creation of this package is motivated by the papers from G. Hinton et. al. from 2006 (see references for details) and from the MATLAB source code developed in this context. This package provides the possibility to generate deep architecture networks (darch) like the deep belief networks from Hinton et. al.. The deep architectures can then be trained with the contrastive divergence method. After this pre-training it can be fine tuned with several learning methods like backpropagation, resilient backpropagation and conjugate gradients as well as more recent techniques like dropout and maxout.

See https://github.com/maddin79/darch for further information, documentation, and releases.

Package: | darch |

Type: | Package |

Version: | 0.10.0 |

Date: | 2015-11-12 |

License: | GPL-2 or later |

LazyLoad: | yes |

##### Value

##### References

Hinton, G. E., S. Osindero, Y. W. Teh, A fast learning algorithm for deep belief nets, Neural Computation 18(7), S. 1527-1554, DOI: 10.1162/neco.2006.18.7.1527 2006.

Hinton, G. E., R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313(5786), S. 504-507, DOI: 10.1126/science.1127647, 2006.

Hinton, Geoffrey E. et al. (2012). "Improving neural networks by preventing coadaptation of feature detectors". In: Clinical Orthopaedics and Related Research abs/1207.0580. URL : http://arxiv.org/abs/1207.0580.

Goodfellow, Ian J. et al. (2013). "Maxout Networks". In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pp. 1319-1327. URL: http://jmlr.org/proceedings/papers/v28/goodfellow13.html.

Drees, Martin (2013). "Implementierung und Analyse von tiefen Architekturen in R". German. Master's thesis. Fachhochschule Dortmund.

Rueckert, Johannes (2015). "Extending the Darch library for deep architectures". Project thesis. Fachhochschule Dortmund. URL: http://static.saviola.de/publications/rueckert_2015.pdf.

##### See Also

Other darch interface functions: `darchBench`

,
`darchTest`

, `plot.DArch`

,
`predict.DArch`

, `print.DArch`

##### Examples

```
# NOT RUN {
data(iris)
model <- darch(Species ~ ., iris)
print(model)
predictions <- predict(model, newdata = iris, type = "class")
cat(paste("Incorrect classifications:", sum(predictions != iris[,5])))
trainData <- matrix(c(0,0,0,1,1,0,1,1), ncol = 2, byrow = TRUE)
trainTargets <- matrix(c(0,1,1,0), nrow = 4)
model2 <- darch(trainData, trainTargets, layers = c(2, 10, 1),
darch.numEpochs = 500, darch.stopClassErr = 0, retainData = T)
e <- darchTest(model2)
cat(paste0("Incorrect classifications on all examples: ", e[3], " (",
e[2], "%)\n"))
plot(model2)
# }
# NOT RUN {
#
# More examples can be found at
# https://github.com/maddin79/darch/tree/v0.12.0/examples
# }
```

*Documentation reproduced from package darch, version 0.12.0, License: GPL (>= 2) | file LICENSE*