cwbtools v0.3.3


Monthly downloads



Tools to Create, Modify and Manage 'CWB' Corpora

The 'Corpus Workbench' ('CWB', <>) offers a classic and mature approach for working with large, linguistically and structurally annotated corpora. The 'CWB' is memory efficient and its design makes running queries fast (Evert and Hardie 2011, <>). The 'cwbtools' package offers pure R tools to create indexed corpus files as well as high-level wrappers for the original C implementation of CWB as exposed by the 'RcppCWB' package <>. Additional functionality to add and modify annotations of corpora from within R makes working with CWB indexed corpora much more flexible and convenient. The 'cwbtools' package in combination with the R packages 'RcppCWB' (<>) and 'polmineR' (<>) offers a lightweight infrastructure to support the combination of quantitative and qualitative approaches for working with textual data.

Functions in cwbtools

Name Description
p_attribute_encode Encode Positional Attribute(s).
cwb_install Utilities to install Corpus Workbench.
conll_get_regions Extract regions from NER annotations (CoNNL format).
cwb_corpus_dir Manage directories for indexed corpora
corpus_install Install and manage corpora.
get_encoding Get Encoding of Character Vector.
CorpusData Manage Corpus Data and Encode CWB Corpus.
cwbtools-package cwbtools-package
pkg_utils Create and manage packages with corpus data.
registry_file_parse Parse and create registry files.
s_attribute_encode Read, process and write data on structural attributes.
No Results!

Vignettes of cwbtools

No Results!

Last month downloads


Type Package
Date 2021-02-20
VignetteBuilder knitr
LazyData yes
License GPL-3
Language en-US
Encoding UTF-8
Collate 'CorpusData.R' 'corpus.R' 'cwb.R' 'cwbtools.R' 'directories.R' 'encoding.R' 'ner.R' 'p_attribute.R' 'pkg.R' 'registry_file.R' 's_attribute.R'
RoxygenNote 7.1.1
NeedsCompilation no
Packaged 2021-02-20 01:34:11 UTC; andreasblaette
Repository CRAN
Date/Publication 2021-02-23 12:20:37 UTC

Include our badge in your README