Learn R Programming

RTextTools (version 1.3.8)

cross_validate: used for cross-validation of various algorithms.

Description

Performs n-fold cross-validation of specified algorithm.

Usage

cross_validate(corpus, nfold, algorithm = c("SVM", "SLDA", "BOOSTING", 
"BAGGING", "RF", "GLMNET", "TREE", "NNET", "MAXENT"), seed = NA, 
method = "C-classification", cross = 0, cost = 100, kernel = "radial", 
maxitboost = 100, maxitglm = 500, size = 1, maxitnnet = 1000, MaxNWts = 10000, 
rang = 0.1, decay = 5e-04, ntree = 200, l1_regularizer = 0, l2_regularizer = 0, 
use_sgd = FALSE, set_heldout = 0, verbose = FALSE)

Arguments

corpus
Class of type matrix_container-class generated by the create_corpus function.
nfold
Number of folds to perform for cross-validation.
algorithm
A string specifying which algorithm to use. Use print_algorithms to see a list of options.
seed
Random seed number used to replicate cross-validation results.
method
Method parameter for SVM implentation. See e1071 documentation for more details.
cross
Cross parameter for SVM implentation. See e1071 documentation for more details.
cost
Cost parameter for SVM implentation. See e1071 documentation for more details.
kernel
Kernel parameter for SVM implentation. See e1071 documentation for more details.
maxitboost
Maximum iterations parameter for boosting implentation. See caTools documentation for more details.
maxitglm
Maximum iterations parameter for glmnet implentation. See glmnet documentation for more details.
size
Size parameter for neural networks implentation. See nnet documentation for more details.
maxitnnet
Maximum iterations for neural networks implentation. See nnet documentation for more details.
MaxNWts
Maximum number of weights parameter for neural networks implentation. See nnet documentation for more details.
rang
Range parameter for neural networks implentation. See nnet documentation for more details.
decay
Decay parameter for neural networks implentation. See nnet documentation for more details.
ntree
Number of trees parameter for RandomForests implentation. See randomForest documentation for more details.
l1_regularizer
An numeric turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization. See maxent documentation for more details.
l2_regularizer
An numeric turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization. See maxent documentation for more details.
use_sgd
A logical indicating that SGD parameter estimation should be used. Defaults to FALSE. See maxent documentation for more details.
set_heldout
An integer specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting. See maxent documentation for more details.
verbose
A logical specifying whether to provide descriptive output about the training process. Defaults to FALSE, or no output. See maxent documentation for more details.

Examples

Run this code
library(RTextTools)
data <- read_data(system.file("data/NYTimes.csv.gz",package="RTextTools"),type="csv")
data <- data[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data$Title,data$Subject), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=weightTfIdf)
corpus <- create_corpus(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)
svm <- cross_validate(corpus,2,algorithm="SVM")
maxent <- cross_validate(corpus,2,algorithm="MAXENT")

Run the code above in your browser using DataLab