cwbtools (version 0.3.3)

registry_file_parse: Parse and create registry files.

Description

A set of functions to parse, create and write registry files.

Usage

registry_file_parse(corpus, registry_dir = Sys.getenv("CORPUS_REGISTRY"))

registry_file_compose(x)

registry_data( name, id, home, info = file.path(home, ".info", fsep = "/"), properties = c(charset = "utf-8"), p_attributes, s_attributes = character() )

registry_file_write( data, corpus, registry_dir = Sys.getenv("CORPUS_REGISTRY"), ... )

Arguments

corpus

A CWB corpus indicated by a length-one character vector.

registry_dir

Directory with registry files.

x

An object of class registry_data.

name

Long descriptive name of corpus (character vector).

id

Short name of corpus (character vector).

home

Path with data directory for indexed corpus.

info

A character vector containing path name of info file.

properties

Named character vector with corpus properties, should at least include 'charset'.

p_attributes

A character vector with positional attributes to declare.

s_attributes

A character vector with structural attributes to declare.

data

A registry_data object.

...

further parameters

Details

registry_file_parse will return an object of class registry_data.

See the appendix to the 'Corpus Encoding Tutorial' (http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial.pdf), which includes an explanation of the registry file format.

registry_file_compose will turn an registry_data-object into a character vector with a registry file that can be written to disk.

registry_file_write will compose a registry file from data and write it to disk.

Examples

Run this code
# NOT RUN {
regdata <- registry_file_parse(
  corpus = "REUTERS",
  registry_dir = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
  )
# }

Run the code above in your browser using DataLab