Learn R Programming

RTextTools (version 1.3.2)

create_matrix: creates a document-term matrix to be passed into create_corpus().

Description

Creates an object of class DocumentTermMatrix from tm that can be used in the create_corpus function.

Usage

create_matrix(textColumns, language = "en", minDocFreq = 1, 
minWordLength = 3, ngramLength = 0, removeNumbers = FALSE, removePunctuation = TRUE, 
removeSparseTerms = 0, removeStopwords = TRUE, selectFreqTerms = 0, 
stemWords = FALSE, stripWhitespace = TRUE, toLower = TRUE, 
weighting = weightTf)

Arguments

textColumns
Either character vector (e.g. data$Title) or a cbind() of columns to use for training the algorithms (e.g. cbind(data$Title,data$Subject)).
language
The language to be used for stemming the text data.
minDocFreq
The minimum number of times a word should appear in a document for it to be included in the matrix. See package tm for more details.
minWordLength
The minimum number of letters a word should contain to be included in the matrix. See package tm for more details.
ngramLength
The number of words to include per n-gram for the document-term matrix.
removeNumbers
A logical parameter to specify whether to remove numbers.
removePunctuation
A logical parameter to specify whether to remove punctuation.
removeSparseTerms
See package tm for more details.
removeStopwords
A logical parameter to specify whether to remove stopwords using the language specified in language.
selectFreqTerms
Select the N most frequent terms in each document to use for training.
stemWords
A logical parameter to specify whether to stem words using the language specified in language.
stripWhitespace
A logical parameter to specify whether to strip whitespace.
toLower
A logical parameter to specify whether to make all text lowercase.
weighting
Either weightTf or weightTfIdf. See package tm for more details.

Examples

Run this code
library(RTextTools)
data <- read_data(system.file("data/NYTimes.csv.gz",package="RTextTools"),type="csv")
data <- data[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data$Title,data$Subject), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=weightTfIdf)

Run the code above in your browser using DataLab