Learn R Programming

BTM (version 0.3.7)

terms.data.frame: Get the set of Biterms from a tokenised data frame

Description

This extracts words occurring in the neighbourhood of one another, within a certain window range. The default setting provides the biterms used when fitting BTM with the default window parameter.

Usage

# S3 method for data.frame
terms(x, type = c("tokens", "biterms"), window = 15, ...)

Value

Depending if type is set to 'tokens' or 'biterms' the following is returned:

  • If type='tokens': a list containing 2 elements:

    • n which indicates the number of tokens

    • tokens which is a data.frame with columns id, token and freq, indicating for all tokens found in the data the frequency of occurrence

  • If type='biterms': a list containing 2 elements:

    • n which indicates the number of biterms used to train the model

    • biterms which is a data.frame with columns term1 and term2, indicating all biterms found in the data. The same biterm combination can occur several times.

    Note that a biterm is unordered, in the output of type='biterms' term1 is always smaller than or equal to term2.

Arguments

x

a tokenised data frame containing one row per token with 2 columns

  • the first column is a context identifier (e.g. a tweet id, a document id, a sentence id, an identifier of a survey answer, an identifier of a part of a text)

  • the second column is a column called of type character containing the sequence of words occurring within the context identifier

type

a character string, either 'tokens' or 'biterms'. Defaults to 'tokens'.

window

integer with the window size for biterm extraction. Defaults to 15.

...

not used

See Also

BTM, predict.BTM, logLik.BTM

Examples

Run this code
if(require(udpipe)){
library(udpipe)
data("brussels_reviews_anno", package = "udpipe")
x <- subset(brussels_reviews_anno, language == "nl")
x <- subset(x, xpos %in% c("NN", "NNP", "NNS"))
x <- x[, c("doc_id", "lemma")]
biterms <- terms(x, window = 15, type = "biterms")
str(biterms)
tokens <- terms(x, type = "tokens")
str(tokens)
} # End of main if statement running only if the required packages are installed

Run the code above in your browser using DataLab