DisProbDup
From PGRdup v0.2.2.1
by J Aravind
Get disjoint probable duplicate sets
DisProbDup
finds and joins intersecting sets in an object of class
ProbDup
to get disjoint probable duplicate sets.
Usage
DisProbDup(pdup, combine = c("F", "P", "S"))
Arguments
- pdup
- An object of class
ProbDup
. - combine
- A character vector indicating the type of sets to be
considered together for retrieving disjoint sets. If
NULL
, then disjoint sets within each type are retrieved (see Details).
Details
This function considers the accession primary keys/IDs for finding
intersecting sets and subsequently joins them to retrieve disjoint sets.
These operations are implemented utilizing the igraph
package functions.
Disjoint sets are retrieved either individually for each type of probable
duplicate sets or considering all type of sets simultaneously. In case of the
latter, the disjoint of all the type of sets alone are returned in the output
as an additional data frame DisjointDuplicates
in an object of class
ProbDup
Value
-
Returns an object of class
ProbDup
with either the disjoint
sets within each type - FuzzyDuplicates
, PhoneticDuplicates
and SemanticDuplicates
when combine = NULL
or the combined
disjoint duplicate sets as an additional element DisjointDupicates
according to the choice specified in the argument combine
.
See Also
Examples
## Not run:
#
# # Load PGR passport database
# GN <- GN1000
#
# # Specify as a vector the database fields to be used
# GNfields <- c("NationalID", "CollNo", "DonorID", "OtherID1", "OtherID2")
#
# # Clean the data
# GN[GNfields] <- lapply(GN[GNfields], function(x) DataClean(x))
# y1 <- list(c("Gujarat", "Dwarf"), c("Castle", "Cary"), c("Small", "Japan"),
# c("Big", "Japan"), c("Mani", "Blanco"), c("Uganda", "Erect"),
# c("Mota", "Company"))
# y2 <- c("Dark", "Light", "Small", "Improved", "Punjab", "SAM")
# y3 <- c("Local", "Bold", "Cary", "Mutant", "Runner", "Giant", "No.",
# "Bunch", "Peanut")
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergeKW(x, y1, delim = c("space", "dash")))
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergePrefix(x, y2, delim = c("space", "dash")))
# GN[GNfields] <- lapply(GN[GNfields], function(x) MergeSuffix(x, y3, delim = c("space", "dash")))
#
# # Generate KWIC index
# GNKWIC <- KWIC(GN, GNfields)
#
# # Specify the exceptions as a vector
# exep <- c("A", "B", "BIG", "BOLD", "BUNCH", "C", "COMPANY", "CULTURE",
# "DARK", "E", "EARLY", "EC", "ERECT", "EXOTIC", "FLESH", "GROUNDNUT",
# "GUTHUKAI", "IMPROVED", "K", "KUTHUKADAL", "KUTHUKAI", "LARGE",
# "LIGHT", "LOCAL", "OF", "OVERO", "P", "PEANUT", "PURPLE", "R",
# "RED", "RUNNER", "S1", "SAM", "SMALL", "SPANISH", "TAN", "TYPE",
# "U", "VALENCIA", "VIRGINIA", "WHITE")
#
# # Specify the synsets as a list
# syn <- list(c("CHANDRA", "AH114"), c("TG1", "VIKRAM"))
#
# # Fetch probable duplicate sets
# GNdup <- ProbDup(kwic1 = GNKWIC, method = "a", excep = exep, fuzzy = TRUE,
# phonetic = TRUE, encoding = "primary",
# semantic = TRUE, syn = syn)
# lapply(GNdup, dim)
#
# # Get disjoint probable duplicate sets of each kind
# disGNdup1 <- DisProbDup(GNdup, combine = NULL)
# lapply(disGNdup1, nrow)
#
# # Get disjoint probable duplicate sets combining all the kinds of sets
# disGNdup2 <- DisProbDup(GNdup, combine = c("F", "P", "S"))
# lapply(disGNdup2, nrow)
#
# ## End(Not run)
Community examples
Looks like there are no examples yet.