trainSupv(rpairs, method, use.pred = FALSE, omit.possible = TRUE,
convert.na = TRUE, include.data = FALSE, ...)
RecLinkData
. Training data.NA
s to 0 in the
comparison patterns.RecLinkClassif
with the following components:include.data
is TRUE
, a copy of rpairs
,
otherwise an empty data frame with the same column names.method
.emClassify
or classifyUnsup
. In the latter case,
argument use.pred
has to be set to TRUE
.
A classifying method has to be provided as a character string (factors are
converted to character) through argument method
.
The supported classifiers are: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Arguments in ...
are passed to the corresponding function.
Most classifiers cannot handle NA
s in the data, so by default these
are converted to 0 before training.
By omit.possible = TRUE
, possible links or pairs with unknown status
are excluded from the trainings set. Setting this argument to FALSE
allows three-class-classification (links, non-links and possible links), but
the results tend to be poor.
Leaving include.data=FALSE
saves memory, setting it to TRUE
can be useful for saving the classificator while keeping track of the
underlying training data.
Bumping, (acronym for n.bootstrap
, which defaults to 25.
classifySupv
for classifying with the trained model,
classifyUnsup
for unsupervised classification# Train a rpart decision tree with additional parameter minsplit
data(RLdata500)
pairs=compare.dedup(RLdata500, identity=identity.RLdata500,
blockfld=list(1,3,5,6,7))
model=trainSupv(pairs, method="rpart", minsplit=5)
summary(model)
Run the code above in your browser using DataLab