Method new()
It initializes the current obj. It is used to set the
transition probabilities options and verbose option.
Usage
TPGenerator$new(opts = list(), ve = 0)
Arguments
opts
The options for generating the transition probabilities.
save_tp. If the data should be saved.
n. The n-gram size.
dir. The directory containing the input and output files.
format. The format for the output. There are two options.
ve
The level of detail in the information messages.
Method generate_tp()
It first generates the transition probabilities for each
n-gram of size from 1 to the given size. The transition probabilities
are then combined into a single data frame and saved to the output
folder that is given as parameter to the current object.
By combining the transition probabilities for all n-gram sizes from 1
to n, back-off can be used to calculate next word probabilities or
predict the next word.
Usage
TPGenerator$generate_tp()
Examples
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("n1.RDS", "n2.RDS", "n3.RDS", "n4.RDS")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# The list of output files
fns <- c("words", "model-4", "tp2", "tp3", "tp4")
# The TPGenerator object is created
tp <- TPGenerator$new(opts = list(n = 4, dir = ed), ve = ve)
# The combined transition probabilities are generated
tp$generate_tp()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
Method generate_tp_for_n()
It generates the transition probabilities table for the
given n-gram size. It first reads n-gram token frequencies from an
input text file.
It then generates a data frame whose columns are the
n-gram prefix, next word and next word frequency. The data frame may
be saved to a file as plain text or as a R obj. If n = 1, then the
list of words is saved.
Usage
TPGenerator$generate_tp_for_n(n)
Arguments
n
The n-gram size for which the tp data is generated.
Method clone()
The objects of this class are cloneable with this method.
Usage
TPGenerator$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.