read.tagged

Either a matrix, a connection or a character vector. If the latter,
      that must be a valid path to a file,
containing the previously analyzed text. If it is a matrix,
      it must contain three columns named "token", "tag", and "lemma",
and only these three columns are used.

file

A character string naming the language of the analyzed corpus. See <code><a rd-options="koRpus:kRp.POS.tags" href="/link/kRp.POS.tags?package=koRpus&version=0.06-5&to=koRpus%3AkRp.POS.tags" data-mini-rdoc="koRpus:kRp.POS.tags::kRp.POS.tags">kRp.POS.tags</a></code>
for all supported languages.
If set to <code>"kRp.env"</code> this is got from <code><a rd-options="koRpus:get.kRp.env" href="/link/get.kRp.env?package=koRpus&version=0.06-5&to=koRpus%3Aget.kRp.env" data-mini-rdoc="koRpus:get.kRp.env::get.kRp.env">get.kRp.env</a></code>.

lang

A character string defining the character encoding of the input file,
      like  <code>"Latin1"</code> or <code>"UTF-8"</code>.
If <code>NULL</code>,
      the encoding will either be taken from a preset (if defined in <code>TT.options</code>), or fall back to <code>""</code>.
Hence you can overwrite the preset encoding with this parameter.

encoding

The software which was used to tokenize and tag the text. Currently,
      TreeTagger is the only
supported tagger.

tagger

Logical,
      whethter the tokens defined in <code>sentc.end</code> should be searched and set to a sentence ending tag.
You could call this a compatibility mode to make sure you get the results you would get if you called
<code><a rd-options="koRpus:treetag" href="/link/treetag?package=koRpus&version=0.06-5&to=koRpus%3Atreetag" data-mini-rdoc="koRpus:treetag::treetag">treetag</a></code> on the original file.
If set to <code>FALSE</code>, the tags will be imported as they are.

apply.sentc.end

A character vector with tokens indicating a sentence ending. This adds to given results,
      it doesn't replace them.

sentc.end

A character vector to be used for stopword detection. Comparison is done in lower case. You can also simply set 
<code>stopwords=tm::stopwords("en")</code> to use the english stopwords provided by the <code>tm</code> package.

stopwords

A function or method to perform stemming. For instance,
      you can set <code>stemmer=Snowball::SnowballStemmer</code> if you
have the <code>Snowball</code> package installed (or <code>SnowballC::wordStem</code>). As of now,
      you cannot provide further arguments to
this function.

stemmer

Logical, whether SGML tags should be ignored and removed from output

rm.sgml


This function can be used on text files or matrices containing already tagged text material,
      e.g. the results of
TreeTagger[1].


misc

A set of tools to analyze texts. Includes, amongst others,
functions for automatic language detection, hyphenation, several indices of
lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability
(e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language
corpora are also provided, to enable frequency analyses (supports Celex and
Leipzig Corpora Collection file formats) and measures like tf-idf. Support for
additional languages can be added on-the-fly or by plugin packages. Note: For
full functionality a local installation of TreeTagger is recommended. 'koRpus'
also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs
for its basic features. The respective R package 'rkward' cannot be installed
directly from a repository, as it is a part of RKWard. To make full use of this
feature, please install RKWard from https://rkward.kde.org (plugins are detected
automatically). Due to some restrictions on CRAN, the full package sources are
only available from the project homepage. To ask for help, report bugs, suggest
feature improvements, or discuss the global development of the package, please
subscribe to the koRpus-dev mailing list (https://ml06.ispgateway.de/mailman/
listinfo/korpus-dev_r.reaktanz.de).

Meik Michalke

koRpus

An R Package for Text Analysis

read.tagged function

A character string naming the language of the analyzed corpus. See <code><a rd-options='koRpus:kRp.POS.tags' href='kRp.POS.tags'>kRp.POS.tags</a></code>
for all supported languages.
If set to <code>"kRp.env"</code> this is got from <code><a rd-options='koRpus:get.kRp.env' href='get.kRp.env'>get.kRp.env</a></code>.

Logical,
      whethter the tokens defined in <code>sentc.end</code> should be searched and set to a sentence ending tag.
You could call this a compatibility mode to make sure you get the results you would get if you called
<code><a rd-options='koRpus:treetag' href='treetag'>treetag</a></code> on the original file.
If set to <code>FALSE</code>, the tags will be imported as they are.

read.tagged: Import already tagged texts

Description

Usage

Arguments

Value

Details

References

See Also

Examples