DisProbDup

0th

Percentile

Get disjoint probable duplicate sets

DisProbDup finds and joins intersecting sets in an object of class ProbDup to get disjoint probable duplicate sets.

Usage
DisProbDup(pdup, combine = c("F", "P", "S"))
Arguments
pdup
An object of class ProbDup.
combine
A character vector indicating the type of sets to be considered together for retrieving disjoint sets. If NULL, then disjoint sets within each type are retrieved (see Details).
Details

This function considers the accession primary keys/IDs for finding intersecting sets and subsequently joins them to retrieve disjoint sets. These operations are implemented utilizing the igraph package functions.

Disjoint sets are retrieved either individually for each type of probable duplicate sets or considering all type of sets simultaneously. In case of the latter, the disjoint of all the type of sets alone are returned in the output as an additional data frame DisjointDuplicates in an object of class ProbDup

Value

Returns an object of class ProbDup with either the disjoint sets within each type - FuzzyDuplicates, PhoneticDuplicates and SemanticDuplicates when combine = NULL or the combined disjoint duplicate sets as an additional element DisjointDupicates according to the choice specified in the argument combine.

See Also

ProbDup

Aliases
  • DisProbDup
Examples
## Not run: 
# 
# # Load PGR passport database
# GN <- GN1000
# 
# # Specify as a vector the database fields to be used
# GNfields <- c("NationalID", "CollNo", "DonorID", "OtherID1", "OtherID2")
# 
# # Clean the data
# GN[GNfields] <- lapply(GN[GNfields], function(x) DataClean(x))
# y1 <- list(c("Gujarat", "Dwarf"), c("Castle", "Cary"), c("Small", "Japan"),
# c("Big", "Japan"), c("Mani", "Blanco"), c("Uganda", "Erect"),
# c("Mota", "Company"))
# y2 <- c("Dark", "Light", "Small", "Improved", "Punjab", "SAM")
# y3 <- c("Local", "Bold", "Cary", "Mutant", "Runner", "Giant", "No.",
#         "Bunch", "Peanut")
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergeKW(x, y1, delim = c("space", "dash")))
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergePrefix(x, y2, delim = c("space", "dash")))
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergeSuffix(x, y3, delim = c("space", "dash")))
# 
# # Generate KWIC index
# GNKWIC <- KWIC(GN, GNfields)
# 
# # Specify the exceptions as a vector
# exep <- c("A", "B", "BIG", "BOLD", "BUNCH", "C", "COMPANY", "CULTURE", 
#          "DARK", "E", "EARLY", "EC", "ERECT", "EXOTIC", "FLESH", "GROUNDNUT", 
#          "GUTHUKAI", "IMPROVED", "K", "KUTHUKADAL", "KUTHUKAI", "LARGE", 
#          "LIGHT", "LOCAL", "OF", "OVERO", "P", "PEANUT", "PURPLE", "R", 
#          "RED", "RUNNER", "S1", "SAM", "SMALL", "SPANISH", "TAN", "TYPE", 
#          "U", "VALENCIA", "VIRGINIA", "WHITE")
#           
# # Specify the synsets as a list
# syn <- list(c("CHANDRA", "AH114"), c("TG1", "VIKRAM"))
# 
# # Fetch probable duplicate sets
# GNdup <- ProbDup(kwic1 = GNKWIC, method = "a", excep = exep, fuzzy = TRUE,
#                  phonetic = TRUE, encoding = "primary", 
#                  semantic = TRUE, syn = syn)
# lapply(GNdup, dim)
# 
# # Get disjoint probable duplicate sets of each kind
# disGNdup1 <- DisProbDup(GNdup, combine = NULL)
# lapply(disGNdup1, nrow)
# 
# # Get disjoint probable duplicate sets combining all the kinds of sets
# disGNdup2 <- DisProbDup(GNdup, combine = c("F", "P", "S"))
# lapply(disGNdup2, nrow)
#                   
# ## End(Not run)                 
Documentation reproduced from package PGRdup, version 0.2.2.1, License: GPL-2 | GPL-3

Community examples

Looks like there are no examples yet.