RTextTools (version 1.4.3)

create_container: creates a container for training, classifying, and analyzing documents.

Description

Given a DocumentTermMatrix from the tm package and corresponding document labels, creates a container of class matrix_container-class that can be used for training and classification (i.e. train_model, train_models, classify_model, classify_models)

Usage

create_container(matrix, labels, trainSize=NULL, testSize=NULL, virgin)

Arguments

matrix

A document-term matrix of class DocumentTermMatrix or TermDocumentMatrix from the tm package, or generated by create_matrix.

labels

A factor or vector of labels corresponding to each document in the matrix.

trainSize

A range (e.g. 1:1000) specifying the number of documents to use for training the models. Can be left blank for classifying corpora using saved models that don't need to be trained.

testSize

A range (e.g. 1:1000) specifying the number of documents to use for classification. Can be left blank for training on all data in the matrix.

virgin

A logical (TRUE or FALSE) specifying whether to treat the classification data as virgin data or not.

Value

A container of class matrix_container-class that can be passed into other functions such as train_model, train_models, classify_model, classify_models, and create_analytics.

Examples

Run this code
# NOT RUN {
library(RTextTools)
data(NYTimes)
data <- NYTimes[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data["Title"],data["Subject"]), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=tm::weightTfIdf)
container <- create_container(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)
# }

Run the code above in your browser using DataCamp Workspace