KWIC

0th

Percentile

Create a KWIC Index

KWIC creates a Keyword in Context index from PGR passport database fields.

Usage
KWIC(x, fields, min.freq = 10)
Arguments
x
A data frame from which KWIC index is to be generated.
fields
A character vector with the names of fields(columns) of the data frame from which KWIC index is to be generated. The first field is considered as the primary key or identifier (see Details).
min.freq
Frequency of keywords are not computed if below min.freq. Default is 10.
Details

The function generates a Keyword in Context index from a data frame of a PGR passport database based on the fields(columns) stated in the arguments, using data.table package.

The first element of vector fields is considered as the primary key or identifier which uniquely identifies all rows in the data frame.

Cleaning of the data the input fields(columns) using the DataClean function with appropriate arguments is suggested before running this function.

Value

A list of class KWIC containing the following components:
KWIC
The KWIC index in the form of a data frame.
KeywordFreq
A data frame of the keywords detected with frequency greater than min.freq.
Fields
A character vector with the names of the PGR database fields from which the keywords were extracted.

References

Knupffer, H. (1988). The European Barley Database of the ECP/GR: an introduction. Die Kulturpflanze, 36(1), 135-162. Knupffer, H., Frese, L., & Jongen, M. W. M. (1997). Using central crop databases: searching for duplicates and gaps. In: E. Lipman, M. W. M. Jongen, T. J. L. van Hintum, T. Gass, and L. Maggioni (Eds.), Central Crop Databases: Tools for Plant Genetic Resources Management (pp. 67-77). Rome, Italy: International Plant Genetic Resources Institute/ Wageningen, The Netherlands: Centre for Genetic Resources.

See Also

data.table, print.KWIC

Aliases
  • KWIC
Examples
# Load PGR passport database
GN <- GN1000

# Specify as a vector the database fields to be used
GNfields <- c("NationalID", "CollNo", "DonorID", "OtherID1", "OtherID2")

# Clean the data
GN[GNfields] <- lapply(GN[GNfields], function(x) DataClean(x))

## Not run: 
# 
# # Generate KWIC index
# GNKWIC <- KWIC(GN, GNfields)
# GNKWIC
# 
# # Retrieve the KWIC index from the KWIC object
# KWIC <- GNKWIC[[1]]
# 
# # Retrieve the keyword frequencies from the KWIC object
# KeywordFreq <- GNKWIC[[2]]
# 
# # Show error in case of duplicates and NULL values 
# # in the primary key/ID field "NationalID"
# GN[1001:1005,] <- GN[1:5,]
# GN[1001,3] <- ""
# GNKWIC <- KWIC(GN, GNfields)
# ## End(Not run)
Documentation reproduced from package PGRdup, version 0.2.2.1, License: GPL-2 | GPL-3

Community examples

Looks like there are no examples yet.