Tools to Create, Modify and Manage 'CWB' Corpora
The 'Corpus Workbench' ('CWB', <http://cwb.sourceforge.net/>) offers a classic and mature
approach for working with large, linguistically and structurally annotated corpora. The 'CWB'
is memory efficient and its design makes running queries fast (Evert and Hardie 2011,
<http://www.stefan-evert.de/PUB/EvertHardie2011.pdf>). The 'cwbtools' package offers
pure R tools to create indexed corpus files as well as high-level wrappers for the original C
implementation of CWB as exposed by the 'RcppCWB' package
<https://CRAN.R-project.org/package=RcppCWB>. Additional functionality to add and
modify annotations of corpora from within R makes working with CWB indexed corpora
much more flexible and convenient. The 'cwbtools' package in combination with the R packages
'RcppCWB' (<https://CRAN.R-project.org/package=RcppCWB>) and 'polmineR'
(<https://CRAN.R-project.org/package=polmineR>) offers a lightweight infrastructure
to support the combination of quantitative and qualitative approaches for working
with textual data.
Details
Type |
Package |
Date |
2021-02-20 |
VignetteBuilder |
knitr |
LazyData |
yes |
License |
GPL-3 |
Language |
en-US |
Encoding |
UTF-8 |
URL |
https://github.com/PolMine/cwbtools |
BugReports |
https://github.com/PolMine/cwbtools/issues |
Collate |
'CorpusData.R' 'corpus.R' 'cwb.R' 'cwbtools.R' 'directories.R'
'encoding.R' 'ner.R' 'p_attribute.R' 'pkg.R' 'registry_file.R'
's_attribute.R' |
RoxygenNote |
7.1.1 |
NeedsCompilation |
no |
Packaged |
2021-02-20 01:34:11 UTC; andreasblaette |
Repository |
CRAN |
Date/Publication |
2021-02-23 12:20:37 UTC |
suggests |
aws.s3
,
janeaustenr
,
knitr
,
NLP
,
openNLP
,
rmarkdown
,
SnowballC
,
testthat
,
tidytext
,
tm
(>= 0.7.3)
,
tokenizers
(>= 0.2.1)
|
imports |
cli
,
curl
,
data.table
,
httr
,
jsonlite
,
methods
,
pbapply
,
R6
,
RcppCWB
(>= 0.2.8)
,
rstudioapi
,
stringi
,
tools
,
xml2
,
zen4R
|
Contributors |
Christoph Leonhardt
|
Include our badge in your README

[](http://www.rdocumentation.org/packages/cwbtools)