Usage
find.threshold.C(corpus, labeling, banned = NULL, R = 0, objective.function = 2, a = 1, verbosity = 0, step.verbosity = verbosity, positive.only = FALSE, binary.features = FALSE, no.regularization = FALSE, positive.weight = 1, Lq = 2, min.support = 1, min.pattern = 1, max.pattern = 100, gap = 0, token.type = "word", convergence.threshold = 1e-04)
Arguments
corpus
A list of strings or a corpus from the tm package.
labeling
A vector of +1/-1 or TRUE/FALSE indicating which documents are considered relevant and
which are baseline. The +1/-1 can contain 0 whcih means drop the document.
banned
List of words that should be dropped from consideration.
R
Number of times to scramble labling. 0 means use given labeling and find single C value.
objective.function
2 is hinge loss. 0 is something. 1 is something else.
a
What percent of regularization should be L1 loss (a=1) vs L2 loss (a=0)
verbosity
Level of output. 0 is no printed output.
step.verbosity
Level of output for line searches. 0 is no printed output.
positive.only
Disallow negative features if true
binary.features
Just code presence/absence of a feature in a document rather than count of feature in document.
no.regularization
Do not renormalize the features at all. (Lq will be ignored.)
positive.weight
Scale weight pf all positively marked documents by this value. (1, i.e., no scaling) is default) NOT FULLY IMPLEMENTED
Lq
Rescaling to put on the features (2 is standard). Can be from 1 up. Values above 10 invoke an infinity-norm.
min.support
Only consider phrases that appear this many times or more.
min.pattern
Only consider phrases this long or longer
max.pattern
Only consider phrases this short or shorter
gap
Allow phrases that have wildcard words in them. Number is how many wildcards in a row.
token.type
"word" or "character" as tokens.
convergence.threshold
How to decide if descent has converged. (Will go for three steps at this threshold to check for flatness.)