
splitData(dataset, prop, keep.mprop = FALSE, num.non = 0, des.mprop = 0,
use.pred = FALSE)
RecLinkData
. Data pairs to split.RecLinkData
objects.pairs
attributes of train
and valid
. If present, the attributes prediction
and Wdata
are split and the corresponding values saved. All other attributes are
copied to both data sets.
If the number of desired matches or non-matches is higher than the number
actually present in the data, the maximum possible number is chosen and a
warning issued.genSamples
for generating training data based on
unsupervised classification.data(RLdata500)
pairs=compare.dedup(RLdata500, identity=identity.RLdata500,
blockfld=list(1,3,5,6,7))
# split into halves, do not enforce match ratio
l=splitData(pairs, prop=0.5)
summary(l$train)
summary(l$valid)
# split into 1/3 and 2/3, retain match ration
l=splitData(pairs, prop=1/3, keep.mprop=TRUE)
summary(l$train)
summary(l$valid)
# generate a training set with 100 non-matches and 10 matches
l=splitData(pairs, num.non=100, des.mprop=0.1, keep.mprop=TRUE)
summary(l$train)
summary(l$valid)
Run the code above in your browser using DataLab