train_model: makes a model object using the specified algorithm.

Description

Creates a trained model using the specified algorithm.

Usage

train_model(corpus, algorithm=c("SVM","SLDA","BOOSTING","BAGGING",
"RF","GLMNET","TREE","NNET","MAXENT"), method = "C-classification", 
cross = 0, cost = 100, kernel = "radial", maxitboost = 100, 
maxitglm = 10^5, size = 1, maxitnnet = 1000, MaxNWts = 10000, 
rang = 0.1, decay = 5e-04, trace=FALSE, ntree = 200, 
l1_regularizer = 0, l2_regularizer = 0, use_sgd = FALSE, 
set_heldout = 0, verbose = FALSE,
...)

Arguments

corpus

Class of type matrix_container-class generated by the create_corpus function.

algorithm

Character vector (i.e. a string) specifying which algorithm to use. Use print_algorithms to see a list of options.

method

Method parameter for SVM implentation. See e1071 documentation for more details.

cross

Cross parameter for SVM implentation. See e1071 documentation for more details.

cost

Cost parameter for SVM implentation. See e1071 documentation for more details.

kernel

Kernel parameter for SVM implentation. See e1071 documentation for more details.

maxitboost

Maximum iterations parameter for boosting implentation. See caTools documentation for more details.

maxitglm

Maximum iterations parameter for glmnet implentation. See glmnet documentation for more details.

size

Size parameter for neural networks implentation. See nnet documentation for more details.

maxitnnet

Maximum iterations for neural networks implentation. See nnet documentation for more details.

MaxNWts

Maximum number of weights parameter for neural networks implentation. See nnet documentation for more details.

rang

Range parameter for neural networks implentation. See nnet documentation for more details.

decay

Decay parameter for neural networks implentation. See nnet documentation for more details.

trace

Trace parameter for neural networks implentation. See nnet documentation for more details.

ntree

Number of trees parameter for RandomForests implentation. See randomForest documentation for more details.

l1_regularizer

An numeric turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization. See maxent documentation for more details.

l2_regularizer

An numeric turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization. See maxent documentation for more details.

use_sgd

A logical indicating that SGD parameter estimation should be used. Defaults to FALSE. See maxent documentation for more details.

set_heldout

An integer specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting. See maxent documentation for more details.

verbose

A logical specifying whether to provide descriptive output about the training process. Defaults to FALSE, or no output. See maxent documentation for more details.

...

Additional arguments to be passed on to algorithm function calls.

Value

Returns a trained model that can be subsequently used in classify_model to classify new data.

Details

Only one algorithm may be selected for training. See train_models and classify_models to train and classify using multiple algorithms.

Examples

Run this code

library(RTextTools)
data <- read_data(system.file("data/NYTimes.csv.gz",package="RTextTools"),type="csv")
data <- data[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data$Title,data$Subject), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=weightTfIdf)
corpus <- create_corpus(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)
maxent_model <- train_model(corpus,"MAXENT")
svm_model <- train_model(corpus,"SVM")

Run the code above in your browser using DataLab