koRpus (version 0.13-8)

taggedText: Getter/setter methods for koRpus objects

Description

These methods should be used to get or set values of tagged text objects generated by koRpus functions like treetag or tokenize.

Usage

taggedText(obj, add.desc = FALSE, doc_id = FALSE)

# S4 method for kRp.text taggedText(obj, add.desc = FALSE, doc_id = FALSE)

taggedText(obj) <- value

# S4 method for kRp.text taggedText(obj) <- value

doc_id(obj, ...)

# S4 method for kRp.text doc_id(obj, has_id = NULL)

hasFeature(obj, feature = NULL, ...)

# S4 method for kRp.text hasFeature(obj, feature = NULL)

hasFeature(obj, feature) <- value

# S4 method for kRp.text hasFeature(obj, feature) <- value

feature(obj, feature, ...)

# S4 method for kRp.text feature(obj, feature, doc_id = NULL)

feature(obj, feature) <- value

# S4 method for kRp.text feature(obj, feature) <- value

corpusReadability(obj, ...)

# S4 method for kRp.text corpusReadability(obj, doc_id = NULL)

corpusReadability(obj) <- value

# S4 method for kRp.text corpusReadability(obj) <- value

corpusHyphen(obj, ...)

# S4 method for kRp.text corpusHyphen(obj, doc_id = NULL)

corpusHyphen(obj) <- value

# S4 method for kRp.text corpusHyphen(obj) <- value

corpusLexDiv(obj, ...)

# S4 method for kRp.text corpusLexDiv(obj, doc_id = NULL)

corpusLexDiv(obj) <- value

# S4 method for kRp.text corpusLexDiv(obj) <- value

corpusFreq(obj, ...)

# S4 method for kRp.text corpusFreq(obj)

corpusFreq(obj) <- value

# S4 method for kRp.text corpusFreq(obj) <- value

corpusCorpFreq(obj, ...)

# S4 method for kRp.text corpusCorpFreq(obj)

corpusCorpFreq(obj) <- value

# S4 method for kRp.text corpusCorpFreq(obj) <- value

corpusStopwords(obj, ...)

# S4 method for kRp.text corpusStopwords(obj)

corpusStopwords(obj) <- value

# S4 method for kRp.text corpusStopwords(obj) <- value

# S4 method for kRp.text,ANY,ANY,ANY [(x, i, j, ..., drop = TRUE)

# S4 method for kRp.text,ANY,ANY,ANY [(x, i, j, ...) <- value

# S4 method for kRp.text [[(x, i, doc_id = NULL, ...)

# S4 method for kRp.text [[(x, i, doc_id = NULL, ...) <- value

# S4 method for kRp.text describe(obj, doc_id = NULL, simplify = TRUE, ...)

# S4 method for kRp.text describe(obj, doc_id = NULL, ...) <- value

# S4 method for kRp.text language(obj)

# S4 method for kRp.text language(obj) <- value

diffText(obj, doc_id = NULL)

# S4 method for kRp.text diffText(obj, doc_id = NULL)

diffText(obj) <- value

# S4 method for kRp.text diffText(obj) <- value

originalText(obj)

# S4 method for kRp.text originalText(obj)

is.taggedText(obj)

is.kRp.text(obj)

fixObject(obj, doc_id = NA)

# S4 method for kRp.text fixObject(obj, doc_id = NA)

tif_as_tokens_df(tokens)

# S4 method for kRp.text tif_as_tokens_df(tokens)

# S4 method for kRp.tagged fixObject(obj, doc_id = NA)

# S4 method for kRp.txt.freq fixObject(obj, doc_id = NA)

# S4 method for kRp.txt.trans fixObject(obj, doc_id = NA)

# S4 method for kRp.analysis fixObject(obj, doc_id = NA)

Arguments

obj

An arbitrary R object.

add.desc

Logical, determines whether the desc column should be re-written with descriptions for all POS tags.

doc_id

Logical (except for fixObject, feature, and [[/[[<-), if TRUE the doc_id column will be a factor with the respective value of the desc slot, i.\,e., the document ID will be preserved in the data.frame. If used with fixObject, can be a character string to set the document ID manually (the default NA will preserve existing values and not overwrite them). If used with feature or [[/[[<-, a character vector to limit the scope to one or more particular document IDs.

value

The new value to replace the current with.

...

Additional arguments for the generics.

has_id

A character vector with doc_ids to look for in the object. The return value is then a logical vector of the same length, indicating if the respective id was found or not.

feature

Character string naming the feature to look for. The return value is logical if a single feature name is given. If feature=NULL, a character vector is returned, naming all features found in the object.

x

An object of class kRp.text or kRp.hyphen.

i

Defines the row selector ([) or the name to match ([[).

j

Defines the column selector.

drop

Logical, whether the result should be coerced to the lowest possible dimension. See [ for more details.

simplify

Logical, if TRUE and the result is a list oft length one (i.e., just a single doc_id), returns the contents of the single list entry.

tokens

An object of class kRp.text.

Details

  • taggedText() returns the tokens slot.

  • doc_id() Returns a character vector of all doc_id values in the object.

  • describe() returns the desc slot.

  • language() returns the lang slot.

  • [/[[ Can be used as a shortcut to index the results of taggedText().

  • fixObject returns the same object upgraded to the object structure of this package version (e.g., new columns, changed names, etc.).

  • hasFeature() returns TRUE or codeFALSE, depending on whether the requested feature is present or not.

  • feature() returns the list entry of the feat_list slot for the requested feature.

  • corpusReadability() returns the list of kRp.readability objects, see readability.

  • corpusHyphen() returns the list of kRp.hyphen objects, see hyphen.

  • corpusLexDiv() returns the list of kRp.TTR objects, see lex.div.

  • corpusFreq() returns the frequency analysis data from the feat_list slot, see freq.analysis.

  • corpusCorpFreq() returns the kRp.corp.freq object of the feat_list slot, see for example read.corp.custom.

  • corpusStopwords() returns the number of stopwords found in each text (if analyzed) from the feat_list slot.

  • tif_as_tokens_df returns the tokens slot in a TIF[1] compliant format, i.e., doc_id is not a factor but a character vector.

  • originalText() similar to taggedText(), but reverts any transformations back to the original text before returning the tokens slot. Only works if the object has the feature diff, see examples.

  • diffText() returns the diff slot, if present.

References

[1] Text Interchange Formats (https://github.com/ropensci/tif)

Examples

Run this code
# NOT RUN {
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )

  doc_id(tokenized.obj)

  describe(tokenized.obj)

  language(tokenized.obj)

  taggedText(tokenized.obj)
  tokenized.obj[["token"]]
  tokenized.obj[1:3, "token"]

  tif_as_tokens_df(tokenized.obj)

  # example for originalText()
  tokenized.obj <- jumbleWords(tokenized.obj)
  # now compare the jumbled words to the original
  tokenized.obj[["token"]]
  originalText(tokenized.obj)[["token"]]
} else {}
# }

Run the code above in your browser using DataLab