Learn R Programming

RTextTools (version 1.3.8)

train_model: makes a model object using the specified algorithm.

Description

Creates a trained model using the specified algorithm.

Usage

train_model(corpus, algorithm=c("SVM","SLDA","BOOSTING","BAGGING",
"RF","GLMNET","TREE","NNET","MAXENT"), method = "C-classification", 
cross = 0, cost = 100, kernel = "radial", maxitboost = 100, 
maxitglm = 10^5, size = 1, maxitnnet = 1000, MaxNWts = 10000, 
rang = 0.1, decay = 5e-04, trace=FALSE, ntree = 200, 
l1_regularizer = 0, l2_regularizer = 0, use_sgd = FALSE, 
set_heldout = 0, verbose = FALSE,
...)

Arguments

corpus
Class of type matrix_container-class generated by the create_corpus function.
algorithm
Character vector (i.e. a string) specifying which algorithm to use. Use print_algorithms to see a list of options.
method
Method parameter for SVM implentation. See e1071 documentation for more details.
cross
Cross parameter for SVM implentation. See e1071 documentation for more details.
cost
Cost parameter for SVM implentation. See e1071 documentation for more details.
kernel
Kernel parameter for SVM implentation. See e1071 documentation for more details.
maxitboost
Maximum iterations parameter for boosting implentation. See caTools documentation for more details.
maxitglm
Maximum iterations parameter for glmnet implentation. See glmnet documentation for more details.
size
Size parameter for neural networks implentation. See nnet documentation for more details.
maxitnnet
Maximum iterations for neural networks implentation. See nnet documentation for more details.
MaxNWts
Maximum number of weights parameter for neural networks implentation. See nnet documentation for more details.
rang
Range parameter for neural networks implentation. See nnet documentation for more details.
decay
Decay parameter for neural networks implentation. See nnet documentation for more details.
trace
Trace parameter for neural networks implentation. See nnet documentation for more details.
ntree
Number of trees parameter for RandomForests implentation. See randomForest documentation for more details.
l1_regularizer
An numeric turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization. See maxent documentation for more details.
l2_regularizer
An numeric turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization. See maxent documentation for more details.
use_sgd
A logical indicating that SGD parameter estimation should be used. Defaults to FALSE. See maxent documentation for more details.
set_heldout
An integer specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting. See maxent documentation for more details.
verbose
A logical specifying whether to provide descriptive output about the training process. Defaults to FALSE, or no output. See maxent documentation for more details.
...
Additional arguments to be passed on to algorithm function calls.

Value

  • Returns a trained model that can be subsequently used in classify_model to classify new data.

Details

Only one algorithm may be selected for training. See train_models and classify_models to train and classify using multiple algorithms.

Examples

Run this code
library(RTextTools)
data <- read_data(system.file("data/NYTimes.csv.gz",package="RTextTools"),type="csv")
data <- data[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data$Title,data$Subject), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=weightTfIdf)
corpus <- create_corpus(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)
maxent_model <- train_model(corpus,"MAXENT")
svm_model <- train_model(corpus,"SVM")

Run the code above in your browser using DataLab