txt_nextgram

a character vector where each element is just 1 term or word

an integer indicating the ngram. Values of 1 will keep the x, a value of 2 will
append the next term to the current term, a value of 3 will append the subsequent
term and the term following that term to the current term

a character element indicating how to <code><a rd-options="" href="/link/paste?package=udpipe&version=0.3" data-mini-rdoc="udpipe::paste">paste</a></code> the subsequent words together

If you have annotated your text using <code><a rd-options="" href="/link/udpipe_annotate?package=udpipe&version=0.3" data-mini-rdoc="udpipe::udpipe_annotate">udpipe_annotate</a></code>,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the next word and the subsequent word andsoforth.
These words can be <code>pasted</code> together to form an n-gram containing
the current word, the next word up, the subsequent word, ...

This natural language processing toolkit provides language-agnostic
'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency
parsing' of raw text. Next to text parsing, the package also allows you to train
annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided
at <http://universaldependencies.org/format.html>. The techniques are explained
in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe', available at <doi:10.18653/v1/K17-3009>.

Jan Wijffels

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and
Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

BNOSAC 

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic 

Milan Straka 

Jana Strakov<c3><a1>

txt_nextgram function

a character element indicating how to <code><a rd-options='' href='paste'>paste</a></code> the subsequent words together

If you have annotated your text using <code><a rd-options='' href='udpipe_annotate'>udpipe_annotate</a></code>,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the next word and the subsequent word andsoforth.
These words can be <code>pasted</code> together to form an n-gram containing
the current word, the next word up, the subsequent word, ...

txt_nextgram: Based on a vector with a word sequence, get n-grams

Description

Usage

Arguments

Value

See Also

Examples