Learn R Programming

seqtrie (version 0.3.5)

RadixForest: RadixForest

Description

Radix Forest class implementation

Arguments

Public fields

forest_pointer

Map of sequence length to RadixTree

char_counter_pointer

Character count data for the purpose of validating input

Methods


Method new()

Create a new RadixForest object

Usage - new

RadixForest$new(sequences = NULL)

Arguments - new

sequences

A character vector of sequences to insert into the forest


Method show()

Print the forest to screen

Usage - show

RadixForest$show()


Method to_string()

Print the forest to a string

Usage - to_string

RadixForest$to_string()


Method graph()

Plot of the forest using igraph

Usage - graph

RadixForest$graph(depth = -1, root_label = "root", plot = TRUE)

Arguments - graph

depth

The tree depth to plot for each tree in the forest.

root_label

The label of the root node(s) in the plot.

plot

Whether to create a plot or return the data used to generate the plot.

Returns - graph

A data frame of parent-child relationships used to generate the igraph plot OR a ggplot2 object


Method to_vector()

Output all sequences held by the forest as a character vector

Usage - to_vector

RadixForest$to_vector()

Returns - to_vector

A character vector of all sequences contained in the forest.


Method size()

Output the size of the forest (i.e. how many sequences are contained)

Usage - size

RadixForest$size()

Returns - size

The size of the forest


Method insert()

Insert new sequences into the forest

Usage - insert

RadixForest$insert(sequences)

Arguments - insert

sequences

A character vector of sequences to insert into the forest

Returns - insert

A logical vector indicating whether the sequence was inserted (TRUE) or already existing in the forest (FALSE)


Method erase()

Erase sequences from the forest

Usage - erase

RadixForest$erase(sequences)

Arguments - erase

sequences

A character vector of sequences to erase from the forest

Returns - erase

A logical vector indicating whether the sequence was erased (TRUE) or not found in the forest (FALSE)


Method find()

Find sequences in the forest

Usage - find

RadixForest$find(query)

Arguments - find

query

A character vector of sequences to find in the forest

Returns - find

A logical vector indicating whether the sequence was found (TRUE) or not found in the forest (FALSE)


Method prefix_search()

Search for sequences in the forest that start with a specified prefix. E.g.: a query of "CAR" will find "CART", "CARBON", "CARROT", etc. but not "CATS".


Method search()

Search for sequences in the forest that are with a specified distance metric to a specified query.

max_distance

how far to search in units of absolute distance. Can be a single value or a vector. Mutually exclusive with max_fraction.

max_fraction

how far to search in units of relative distance to each query sequence length. Can be a single value or a vector. Mutually exclusive with max_distance.

mode

The distance metric to use. One of hamming (hm), global (gb) or anchored (an).

nthreads

The number of threads to use for parallel computation.

show_progress

Whether to show a progress bar.


Method validate()

Validate the forest

Usage - validate

RadixForest$validate()

Returns - validate

A logical indicating whether the forest is valid (TRUE) or not (FALSE). This is mostly an internal function for debugging purposes and should always return TRUE.

Details

The RadixForest class is a specialization of the RadixTree implementation. Instead of putting sequences into a single tree, the RadixForest class puts sequences into separate trees based on sequence length. This allows for faster searching of similar sequences based on Hamming or Levenshtein distance metrics. Unlike the RadixTree class, the RadixForest class does not support anchored searches or a custom cost matrix. See RadixTree for additional details.

Examples

Run this code
forest <- RadixForest$new()
forest$insert(c("ACGT", "AAAA"))
forest$erase("AAAA")
forest$search("ACG", max_distance = 1, mode = "levenshtein")
 #   query target distance
 # 1   ACG   ACGT        1
 
forest$search("ACG", max_distance = 1, mode = "hamming")
 # query    target   distance
 # <0 rows> (or 0-length row.names)

Run the code above in your browser using DataLab