epiWeights(rpairs, e = 0.01, f, ...)
  ## S3 method for class 'RecLinkData':
epiWeights(rpairs, e = 0.01, f = rpairs$frequencies)
  ## S3 method for class 'RLBigData':
epiWeights(rpairs, e = 0.01, f = getFrequencies(rpairs),
    withProgressBar = (sink.number()==0))rpairs with the weights attached. See the class documentation
  ("RecLinkData", "RLBigDataDedup " and
  "RLBigDataLinkage ") on how weights are stored.
  
  For the "RLBigData" method, the returned object is only a shallow
  copy in the sense that it links to the same database file as rpairs"RLBigData" method writes a table with weights in the database
  file of rpairs, which means that changes apply to the provided object
  (similar to pass-by-reference style). If the existing state of rpairs
  is to be preserved, a copy should be made using clone before
  applying this function."RecLinkData"
  as well as S4 objects of classes "RLBigDataDedup " and
  "RLBigDataLinkage ".
  The weight for a record pair $(x^{1},x^{2})$ is computed by
  the formula 
  $$\frac{\sum_{i}w_{i}s(x^{1}_{i},x^{2}_{i})}{\sum_{i}w_{i}}$$
  where $s(x^{1}_{i},x^{2}_{i})$ is the value of a string comparison of
  records $x^{1}$ and $x^{2}$ in the i-th field and 
  $w_{i}$ is a weighting factor computed by 
  $$w_{i}=\log_{2}(1-e_{i})/f_{i}$$
   where $f_{i}$ denotes the
  average frequency of values and $e_{i}$ the estimated error rate
  for field $i$. 
  
  String comparison values are taken from the record pairs as they were
  generated with compare.dedup or compare.linkage.
  The use of binary patterns is possible, but in general yields poor results.
  
  The average frequency of values is by default taken from the object
  rpairs. Both frequency and error rate e can be set to a single 
  value, which will be recycled, or to a vector with distinct error rates for 
  every field. 
  
  The error rate(s) and frequencie(s) must satisfy 
  $e_{i}\leq{}1-f_{i}$ for all $i$, otherwise
  the functions fails. Also, some other rare combinations can result in weights
  with illegal values (NaN, less than 0 or greater than 1). In this case a
  warning is issued.  
  
  By default, the "RLBigDataDedup " method displays a
  progress bar unless output is diverted by sink, e.g. when processing
  a Sweave file.epiClassify for classification based on EpiLink weights.
  emWeights for a different approach for weight calculation.# generate record pairs
data(RLdata500)
p=compare.dedup(RLdata500,strcmp=TRUE ,strcmpfun=levenshteinSim,
  identity=identity.RLdata500)
# calculate weights
p=epiWeights(p)
# classify and show results
summary(epiClassify(p,0.6))Run the code above in your browser using DataLab