
maxent-class
given a matrix
or matrix.csr
with training data, and a vector
or factor
with corresponding labels. Additional parameters such as feature_cutoff
, gaussian_prior
, inequality_constraints
, and set_heldout
help prevent model overfitting.
maxent(feature_matrix, code_vector, l1_regularizer=0.0, l2_regularizer=0.0, use_sgd=FALSE, set_heldout=0, verbose=FALSE)
factor
or vector
of labels corresponding to each document in the feature_matrix
.
numeric
turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization.
numeric
turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization.
logical
indicating that SGD parameter estimation should be used. Defaults to FALSE
.
integer
specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting.
logical
specifying whether to provide descriptive output about the training process. Defaults to FALSE
, or no output.
maxent-class
with two slots.character vector
containing the trained maximum entropy model.data.frame
listing all the weights in three columns: Weight
, Label
, and Feature
. 1. Set the l1_regularizer
parameter to 1.0
, leaving l2_regularizer
and set_heldout
as default.
2. Set the l2_regularizer
parameter to 1.0
, leaving l1_regularizer
and set_heldout
as default.
3. Set the set_heldout
parameter to hold-out a portion of your data, leaving l1_regularizer
and l2_regularizer
as default.
If you are using a large number of training samples, try setting the use_sgd
parameter to TRUE
.
# LOAD LIBRARY
library(maxent)
# READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX
data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent"))
corpus <- Corpus(VectorSource(data$Title[1:150]))
matrix <- DocumentTermMatrix(corpus)
# TRAIN USING SPARSEM REPRESENTATION
sparse <- as.compressed.matrix(matrix)
model <- maxent(sparse[1:100,],as.factor(data$Topic.Code)[1:100])
# A DIFFERENT EXAMPLE (taken from package e10711)
# CREATE DATA
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
# ESTIMATE MODEL AND PREDICT INPUT VALUES
m <- maxent(x, y)
new <- predict(m, x)
# VISUALIZE
plot(x, y)
points(x, log(x), col = 2)
points(x, new[,1], col = 4)
Run the code above in your browser using DataLab