Learn R Programming

⚠️There's a newer version (1.6.3) of this package.Take me there.

biogram package

This package contains tools for extraction and analysis of various n-grams (sequences of n items) derived from biological sequences (proteins or nucleic acids). To deal with the curse of dimensionality of the n-grams, biogram uses Quick Permutation Test (QuiPT) for fast feature filtering.

Installation

biogram is available on CRAN, so installation is as simple as:

install.packages("biogram")

You can install the latest development version of the code using the devtools R package.

# Install devtools, if you haven't already.
install.packages("devtools")

library(devtools)
install_github("michbur/biogram")

For citation type:

citation("biogram")

or use: Michal Burdukiewicz, Piotr Sobczyk and Chris Lauber (2016). biogram: N-Gram Analysis of Biological Sequences. R package version 1.3. https://cran.r-project.org/package=biogram

Copy Link

Version

Install

install.packages('biogram')

Monthly Downloads

256

Version

1.3

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Michal Burdukiewicz

Last Published

September 21st, 2016

Functions in biogram (1.3)

calc_kl

Calculate KL divergence of features
calc_ed

Calculate encoding distance
calc_cs

Calculate Chi-squared-based measure
calc_si

Compute similarity index
add_1grams

Add 1-grams
as.data.frame.feature_test

Coerce feature_test object to a data frame
binarize

Binarize
biogram-package

biogram - analysis of biological sequences using n-grams
calc_criterion

Calculate value of criterion
aaprop

Normalized amino acids properties
cluster_reg_exp

Clustering of sequences based on regular expression
create_ngrams

Get all possible n-Grams
count_multigrams

Detect and count multiple n-grams in sequences
count_specified

Count specified n-grams
construct_ngrams

Construct and filter n-grams
count_total

Count total number of n-grams
criterion_distribution

criterion_distribution class
create_feature_target

Create feature according to given contingency matrix
code_ngrams

Code n-grams
count_ngrams

Count n-grams in sequences
get_ngrams_ind

Get indices of n-grams
cut.feature_test

Categorize tested features
human_cleave

Human signal peptides cleavage sites
gap_ngrams

Gap n-grams
decode_ngrams

Decode n-grams
feature_test

feature_test class
degenerate

Degenerate protein sequence
distr_crit

Compute criterion distribution
encoding2df

Convert encoding to data frame
fast_crosstable

Very fast 2d cross-tabulation
list2matrix

Convert list of sequences to matrix
l2n

Convert letters to numbers
ngrams2df

n-grams to data frame
n2l

Convert numbers to letters
print.feature_test

Print tested features
position_ngrams

Position n-grams
plot.criterion_distribution

Plot criterion distribution
seq2ngrams

Extract n-grams from sequence
is_ngram

Validate n-gram
summary.feature_test

Summarize tested features
table_ngrams

Tabulate n-grams
validate_encoding

Validate encoding
test_features

Permutation test for feature selection