Last chance! 50% off unlimited learning
Sale ends in
For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)
kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE, ...)is.kwic(x)
# S3 method for kwic
as.tokens(x)
a keyword pattern or phrase consisting of multiple keyword
patterns, possibly including punctuation. If a phrase, keywords
will be tokenized using the ...
options.
the number of context words to be displayed around the keyword.
how to interpret keyword expressions: "glob"
for
"glob"-style wildcard expressions; "regex"
for regular expressions;
or "fixed"
for exact matching. See valuetype for details.
match without respect to case if TRUE
additional arguments passed to tokens, for applicable object types
A kwic object classed data.frame, with the document name
(docname
), the token index position (position
), the context
before (contextPre
), the keyword in its original format
(keyword
, preserving case and attached punctuation), and the context
after (contextPost
).
# NOT RUN {
head(kwic(data_corpus_inaugural, "secure*", window = 3, valuetype = "glob"))
head(kwic(data_corpus_inaugural, "secur", window = 3, valuetype = "regex"))
head(kwic(data_corpus_inaugural, "security", window = 3, valuetype = "fixed"))
toks <- tokens(data_corpus_inaugural)
kwic(data_corpus_inaugural, "war against")
kwic(data_corpus_inaugural, "war against", valuetype = "regex")
mykwic <- kwic(data_corpus_inaugural, "provident*")
is.kwic(mykwic)
is.kwic("Not a kwic")
# }
Run the code above in your browser using DataLab