Last chance! 50% off unlimited learning
Sale ends in
Get concordances for the matches for a query / perform keyword-in-context (kwic) analysis.
kwic(.Object, ...)# S4 method for context
kwic(
.Object,
s_attributes = getOption("polmineR.meta"),
cpos = TRUE,
verbose = FALSE
)
# S4 method for slice
kwic(
.Object,
query,
cqp = is.cqp,
left = getOption("polmineR.left"),
right = getOption("polmineR.right"),
s_attributes = getOption("polmineR.meta"),
region = NULL,
p_attribute = "word",
boundary = NULL,
cpos = TRUE,
stoplist = NULL,
positivelist = NULL,
regex = FALSE,
verbose = TRUE,
...
)
# S4 method for partition
kwic(
.Object,
query,
cqp = is.cqp,
left = getOption("polmineR.left"),
right = getOption("polmineR.right"),
s_attributes = getOption("polmineR.meta"),
p_attribute = "word",
region = NULL,
boundary = NULL,
cpos = TRUE,
stoplist = NULL,
positivelist = NULL,
regex = FALSE,
verbose = TRUE,
...
)
# S4 method for subcorpus
kwic(
.Object,
query,
cqp = is.cqp,
left = getOption("polmineR.left"),
right = getOption("polmineR.right"),
s_attributes = getOption("polmineR.meta"),
p_attribute = "word",
region = NULL,
boundary = NULL,
cpos = TRUE,
stoplist = NULL,
positivelist = NULL,
regex = FALSE,
verbose = TRUE,
...
)
# S4 method for corpus
kwic(
.Object,
query,
cqp = is.cqp,
check = TRUE,
left = as.integer(getOption("polmineR.left")),
right = as.integer(getOption("polmineR.right")),
s_attributes = getOption("polmineR.meta"),
p_attribute = "word",
region = NULL,
boundary = NULL,
cpos = TRUE,
stoplist = NULL,
positivelist = NULL,
regex = FALSE,
verbose = TRUE,
...
)
# S4 method for character
kwic(
.Object,
query,
cqp = is.cqp,
check = TRUE,
left = as.integer(getOption("polmineR.left")),
right = as.integer(getOption("polmineR.right")),
s_attributes = getOption("polmineR.meta"),
p_attribute = "word",
region = NULL,
boundary = NULL,
cpos = TRUE,
stoplist = NULL,
positivelist = NULL,
regex = FALSE,
verbose = TRUE,
...
)
# S4 method for remote_corpus
kwic(.Object, ...)
# S4 method for remote_partition
kwic(.Object, ...)
# S4 method for remote_subcorpus
kwic(.Object, ...)
# S4 method for partition_bundle
kwic(.Object, ..., verbose = FALSE)
# S4 method for subcorpus_bundle
kwic(.Object, ...)
If there are no matches, or if all (initial) matches are dropped due to the
application of a positivelist, a NULL
is returned.
A (length-one) character
vector with the name of a CWB
corpus, a partition
or context
object.
Further arguments, used to ensure backwards compatibility. If
.Object
is a remote_corpus
of remote_partition
object,
the three dots (...
) are used to pass arguments. Hence, it is
necessary to state the names of all arguments to be passed explicity.
Structural attributes (s-attributes) to include into output table as metainformation.
Logical, if TRUE
, a data.table
with the corpus
positions ("cpos") of the hits and their surrounding context will be
assigned to the slot "cpos" of the kwic
-object that is returned.
Defaults to TRUE
, as the availability of the cpos-data.table
will often be a prerequisite for further operations on the kwic
object. Omitting the table may however be useful to minimize memory
consumption.
A logical
value, whether to print messages.
A query, CQP-syntax can be used.
Either a logical value (TRUE
if query
is a CQP
query), or a function to check whether query is a CQP query or not
(defaults to auxiliary function is.query
).
A single integer
value defining the number of tokens to the
left of the query match to include in the context. Advanced usage: (a) If
left
is a length-one character
vector stating an s-attribute, the
context will be expanded to the (left) boundary of the region where the
match occurs. (b) If left
is a named length-one integer
vector, this
value is the number regions of the structural attribute referred to by the
vector's name to the left of the query match that are included in the
context.
A single integer
value, a length-one character
vector or a
named length-one integer
value, with equivalent effects to argument
left
.
An s-attribute, given by a length-one character
vector.
The context of query matches will be expanded to the left and right
boundary of the region where the match is located. If arguments left
and
right
are > 1, the left and right boundaries of the respective number of
regions will be identified.
The p-attribute, defaults to 'word'.
If provided, a length-one character vector stating an s-attribute that will be used to check the boundaries of the text.
Terms or ids to prevent a concordance from occurring in results.
Terms or ids required for a concordance to occurr in results
Logical, whether stoplist
/positivelist
is
interpreted as regular expression.
A logical
value, whether to check validity of CQP query
using check_cqp_query
.
The method works with a whole CWB corpus defined by a character vector, and
can be applied on a partition
- or a context
object.
If query
produces a lot of matches, the DT::datatable()
function used to
produce output in the Viewer pane of RStudio may issue a warning. Usually,
this warning is harmless and can be ignored. Use
options("polmineR.warn.size" = FALSE)
for turning off this warning.
If a positivelist
is supplied, only those concordances will be kept that
have one of the terms from the positivelist
occurr in the context of
the query match. Use argument regex
if the positivelist should be
interpreted as regular expressions. Tokens from the positivelist will be
highlighted in the output table.
If a negativelist
is supplied, concordances are removed if any of the
tokens of the negativelist
occurrs in the context of the query match.
Applying the kwic
-method on a partition_bundle
or
subcorpus_bundle
will return a single kwic
object that
includes a column 'subcorpus_name' with the name of the subcorpus
(or partition
) in the input object where the match for a concordance
occurs.
Baker, Paul (2006): Using Corpora in Discourse Analysis. London: continuum, pp. 71-93 (ch. 4).
Jockers, Matthew L. (2014): Text Analysis with R for Students of Literature. Cham et al: Springer, pp. 73-87 (chs. 8 & 9).
The return value is a kwic-class
object; the
documentation for the class explains the standard generic methods
applicable to kwic-class
objects. It is possible to read the
whole text where a query match occurs, see the read
-method.
To highlight terms in the context of a query match, see the
highlight
-method.
use("polmineR")
use(pkg = "RcppCWB", corpus = "REUTERS")
# basic usage
K <- kwic("GERMAPARLMINI", "Integration")
if (interactive()) show(K)
oil <- corpus("REUTERS") %>% kwic(query = "oil")
if (interactive()) show(oil)
oil <- corpus("REUTERS") %>%
kwic(query = "oil") %>%
highlight(yellow = "crude")
if (interactive()) show(oil)
# increase left and right context and display metadata
K <- kwic(
"GERMAPARLMINI",
"Integration", left = 20, right = 20,
s_attributes = c("date", "speaker", "party")
)
if (interactive()) show(K)
# use CQP syntax for matching
K <- kwic(
"GERMAPARLMINI",
'"Integration" [] "(Menschen|Migrant.*|Personen)"', cqp = TRUE,
left = 20, right = 20,
s_attributes = c("date", "speaker", "party")
)
if (interactive()) show(K)
# check that boundary of region is not transgressed
K <- kwic(
"GERMAPARLMINI",
'"Sehr" "geehrte"', cqp = TRUE,
left = 100, right = 100,
boundary = "date"
)
if (interactive()) show(K)
# use positivelist and highlight matches in context
K <- kwic("GERMAPARLMINI", query = "Integration", positivelist = "[Ee]urop.*", regex = TRUE)
K <- highlight(K, yellow = "[Ee]urop.*", regex = TRUE)
# Apply kwic on partition_bundle/subcorpus_bundle
gparl_2009_11_10_speeches <- corpus("GERMAPARLMINI") %>%
subset(date == "2009-11-10") %>%
as.speeches(
s_attribute_name = "speaker", s_attribute_date = "date",
progress = FALSE, verbose = FALSE
)
k <- kwic(gparl_2009_11_10_speeches, query = "Integration")
Run the code above in your browser using DataLab