regress.text: Fit regularized regressions to text data given a corpus and outputs.

Description

This function will fit regularized regressions to text data given a corpus and outputs.

Usage

regress.text(text, y, n.splits = 10, size = 0.8, standardizeCase = TRUE, stripSpace = TRUE, removeStopwords = TRUE)

Arguments

text

A character vector containing the documents for analysis.

A numeric vector of outputs associated with the documents.

n.splits

How many resampling steps should be used to set lambda?

size

How much of the data should be used during resampling for model fitting?

standardizeCase

Should all of the text be standardized on lowercase?

stripSpace

Should all whitespace be stripped from the text?

removeStopwords

Should tm's list of English stopwords be pulled out of the text?

Value

A list containing regression coefficients, the terms used with those coefficients, the value of lambda used for model assessment, and an estimate of the RMSE associated with that model.

Examples

Run this code

library('TextRegression')

text <- c('saying text is good',
'saying text once and saying text twice is better',
'saying text text text is best',
'saying text once is still ok',
'not saying it at all is bad',
'because text is a good thing',
'we all like text',
'even though sometimes it is missing')

y <- c(1, 2, 3, 1, 0, 1, 1, 0)

results <- regress.text(text, y)

print(results)

Run the code above in your browser using DataLab