Learn R Programming

sbo

sbo provides utilities for building and evaluating text predictors based on Stupid Back-off N-gram models in R. It includes functions such as:

  • kgram_freqs(): Extract (k)-gram frequency tables from a text corpus
  • sbo_predictor(): Train a next-word predictor via Stupid Back-off.
  • eval_sbo_predictor(): Test text predictions against an independent corpus.

Installation

Released version

You can install the latest release of sbo from CRAN:

install.packages("sbo")

Development version:

You can install the development version of sbo from GitHub:

# install.packages("devtools")
devtools::install_github("vgherard/sbo")

Example

This example shows how to build a text predictor with sbo:

library(sbo)
p <- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset
                   N = 3, # Train a 3-gram model
                   dict = sbo::twitter_dict, # Top 1k words appearing in corpus
                   .preprocess = sbo::preprocess, # Preprocessing transformation
                   EOS = ".?!:;" # End-Of-Sentence characters
                   )

The object p can now be used to generate predictive text as follows:

predict(p, "i love") # a character vector
#> [1] "you" "it"  "my"
predict(p, "you love") # another character vector
#> [1] "<EOS>" "me"    "the"
predict(p, 
        c("i love", "you love", "she loves", "we love", "you love", "they love")
        ) # a character matrix
#>      [,1]    [,2]  [,3] 
#> [1,] "you"   "it"  "my" 
#> [2,] "<EOS>" "me"  "the"
#> [3,] "you"   "my"  "me" 
#> [4,] "you"   "our" "it" 
#> [5,] "<EOS>" "me"  "the"
#> [6,] "to"    "you" "and"

Help

For help, see the sbo website.

Copy Link

Version

Install

install.packages('sbo')

Monthly Downloads

61

Version

0.5.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Valerio Gherardi

Last Published

December 5th, 2020

Functions in sbo (0.5.0)

kgram_freqs

k-gram frequency tables
plot.word_coverage

Plot method for word_coverage objects
eval_sbo_predictor

Evaluate Stupid Back-off next-word predictions
as_sbo_dictionary

Coerce to dictionary
predict.sbo_kgram_freqs

Predict method for k-gram frequency tables
babble

Babble!
preprocess

Preprocess text corpus
prune

Prune k-gram objects
predict.sbo_predictor

Predict method for Stupid Back-off text predictor
sbo-package

sbo: Text Prediction via Stupid Back-Off N-Gram Models
sbo_dictionary

Dictionaries
twitter_predtable

Next-word prediction tables from 3-gram model trained on Twitter training set
twitter_freqs

k-gram frequencies from Twitter training set
word_coverage

Word coverage fraction
twitter_train

Twitter training set
twitter_test

Twitter test set
tokenize_sentences

Sentence tokenizer
twitter_dict

Top 1000 dictionary from Twitter training set
sbo_predictions

Stupid Back-off text predictions