Learn R Programming

⚠️There's a newer version (1.2.3) of this package.Take me there.

tidysq

Overview

tidysq contains tools for analysis and manipulation of biological sequences (including amino acid and nucleic acid – e.g. RNA, DNA – sequences). Two major features of this package are:

  • effective compression of sequence data, allowing to fit larger datasets in R,

  • compatibility with most of tidyverse universe, especially dplyr and vctrs packages, making analyses tidier.

Getting started

Try our quick start vignette or our exhaustive documentation.

Installation

The easiest way to install tidysq package is to download its latest version from CRAN repository:

install.packages("tidysq")

Alternatively, it is possible to download the development version directly from GitHub repository:

# install.packages("devtools")
devtools::install_github("BioGenies/tidysq")

Example usage

library(tidysq)
file <- system.file("examples", "example_aa.fasta", package = "tidysq")
sqibble <- read_fasta(file)
sqibble
#> # A tibble: 421 × 2
#>    sq             name                               
#>    <ami_bsc>      <chr>                              
#>  1 PGGGKVQIV <13> AMY1|K19|T-Protein (Tau)           
#>  2 NLKHQPGGG <43> AMY9|K19Gluc41|T-Protein (Tau)     
#>  3 NLKHQPGGG <19> AMY14|K19Gluc782|T-Protein (Tau)   
#>  4 GKVQIVYK   <8> AMY17|PHF8|T-Protein (Tau)         
#>  5 VQIVYK     <6> AMY18|PHF6|T-Protein (Tau)         
#>  6 DAEFRHDSG <40> AMY22|Whole|Amyloid beta A4 peptide
#>  7 VPHQKLVFF <15> AMY23|HABP1|Amyloid beta A4 peptide
#>  8 VHPQKLVFF <15> AMY24|HABP2|Amyloid beta A4 peptide
#>  9 VHHPKLVFF <15> AMY25|HABP3|Amyloid beta A4 peptide
#> 10 VHHQPLVFF <15> AMY26|HABP4|Amyloid beta A4 peptide
#> # … with 411 more rows

sq_ami <- sqibble$sq
sq_ami
#> basic amino acid sequences list:
#>  [1] PGGGKVQIVYKPV                                                          <13>
#>  [2] NLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVE                            <43>
#>  [3] NLKHQPGGGKVQIVYKEVD                                                    <19>
#>  [4] GKVQIVYK                                                                <8>
#>  [5] VQIVYK                                                                  <6>
#>  [6] DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV                               <40>
#>  [7] VPHQKLVFFAEDVGS                                                        <15>
#>  [8] VHPQKLVFFAEDVGS                                                        <15>
#>  [9] VHHPKLVFFAEDVGS                                                        <15>
#> [10] VHHQPLVFFAEDVGS                                                        <15>
#> printed 10 out of 421

# Subsequences can be extracted with bite()
bite(sq_ami, 5:10)
#> Warning in CPP_bite(x, indices, NA_letter, on_warning): some sequences are
#> subsetted with index bigger than length - NA introduced
#> basic amino acid sequences list:
#>  [1] KVQIVY                                                                  <6>
#>  [2] QPGGGK                                                                  <6>
#>  [3] QPGGGK                                                                  <6>
#>  [4] IVYK!!                                                                  <6>
#>  [5] YK!!!!                                                                  <6>
#>  [6] RHDSGY                                                                  <6>
#>  [7] KLVFFA                                                                  <6>
#>  [8] KLVFFA                                                                  <6>
#>  [9] KLVFFA                                                                  <6>
#> [10] PLVFFA                                                                  <6>
#> printed 10 out of 421

# There are also more traditional functions
reverse(sq_ami)
#> basic amino acid sequences list:
#>  [1] VPKYVIQVKGGGP                                                          <13>
#>  [2] EVQGGGPKHHINGLSGCKSTVKSLDVPKYVIQVKGGGPQHKLN                            <43>
#>  [3] DVEKYVIQVKGGGPQHKLN                                                    <19>
#>  [4] KYVIQVKG                                                                <8>
#>  [5] KYVIQV                                                                  <6>
#>  [6] VVGGVMLGIIAGKNSGVDEAFFVLKQHHVEYGSDHRFEAD                               <40>
#>  [7] SGVDEAFFVLKQHPV                                                        <15>
#>  [8] SGVDEAFFVLKQPHV                                                        <15>
#>  [9] SGVDEAFFVLKPHHV                                                        <15>
#> [10] SGVDEAFFVLPQHHV                                                        <15>
#> printed 10 out of 421

# find_motifs() returns a whole tibble of useful informations
find_motifs(sqibble$sq, sqibble$name, "^VHX")
#> # A tibble: 9 × 5
#>   names                                found     sought start   end
#>   <chr>                                <ami_bsc> <chr>  <int> <int>
#> 1 AMY24|HABP2|Amyloid beta A4 peptide  VHP   <3> ^VHX       1     3
#> 2 AMY25|HABP3|Amyloid beta A4 peptide  VHH   <3> ^VHX       1     3
#> 3 AMY26|HABP4|Amyloid beta A4 peptide  VHH   <3> ^VHX       1     3
#> 4 AMY34|HABP12|Amyloid beta A4 peptide VHH   <3> ^VHX       1     3
#> 5 AMY35|HABP13|Amyloid beta A4 peptide VHH   <3> ^VHX       1     3
#> 6 AMY36|HABP14|Amyloid beta A4 peptide VHH   <3> ^VHX       1     3
#> 7 AMY38|HABP16|Amyloid beta A4 peptide VHH   <3> ^VHX       1     3
#> 8 AMY43|AB5|Amyloid beta A4 peptide    VHH   <3> ^VHX       1     3
#> 9 AMY195|86-95|Prion protein (human)   VHD   <3> ^VHX       1     3

An example of dplyr integration:

library(dplyr)
# tidysq integrates well with dplyr verbs
sqibble %>%
  filter(sq %has% "VFF") %>%
  mutate(length = get_sq_lengths(sq))
#> # A tibble: 24 × 3
#>    sq             name                                 length
#>    <ami_bsc>      <chr>                                 <dbl>
#>  1 DAEFRHDSG <40> AMY22|Whole|Amyloid beta A4 peptide      40
#>  2 VPHQKLVFF <15> AMY23|HABP1|Amyloid beta A4 peptide      15
#>  3 VHPQKLVFF <15> AMY24|HABP2|Amyloid beta A4 peptide      15
#>  4 VHHPKLVFF <15> AMY25|HABP3|Amyloid beta A4 peptide      15
#>  5 VHHQPLVFF <15> AMY26|HABP4|Amyloid beta A4 peptide      15
#>  6 KKLVFFPED  <9> AMY32|HABP10|Amyloid beta A4 peptide      9
#>  7 VHHQEKLVF <16> AMY34|HABP12|Amyloid beta A4 peptide     16
#>  8 VHHQEKLVF <16> AMY35|HABP13|Amyloid beta A4 peptide     16
#>  9 VHHQEKLVF <16> AMY36|HABP14|Amyloid beta A4 peptide     16
#> 10 KKLVFFAED  <9> AMY37|HABP15|Amyloid beta A4 peptide      9
#> # … with 14 more rows

Citation

For citation type:

citation("tidysq")

or use:

Michal Burdukiewicz, Dominik Rafacz, Laura Bakala, Jadwiga Slowik, Weronika Puchala, Filip Pietluch, Katarzyna Sidorczuk, Stefan Roediger and Leon Eyrich Jessen (2021). tidysq: Tidy Processing and Analysis of Biological Sequences. R package version 1.1.3.

Copy Link

Version

Install

install.packages('tidysq')

Monthly Downloads

225

Version

1.1.3

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Dominik Rafacz

Last Published

January 31st, 2022

Functions in tidysq (1.1.3)

as.character.sq

Convert sq object into character vector
is.sq

Check if object has specified type
remove_na

Remove sequences that contain NA values
write_fasta

Save sq to fasta file
typify

Set type of an sq object
paste

Paste sequences in string-like fashion
reverse

Reverse sequence
random_sq

Generate random sequences
bite

Subset sequences from sq objects
sqconcatenate

Concatenate sq objects
sqapply

Apply function to each sequence
get_standard_alphabet

Get standard alphabet for given type.
get_tidysq_options

Obtain current state of tidysq options
complement

Create complement sequence from dnasq or rnasq object
sq_type

Get type of an sq object
translate

Convert DNA or RNA into proteins using genetic code
==.sq

Compare sq objects
tidysq-package

tidysq: tidy analysis of biological sequences
collapse

Collapse multiple sequences into one
is_empty_sq

Test if sequence is empty
sq-class

sq: class for keeping biological sequences tidy
sqextract

Extract parts of a sq object
sqprint

Print sq object
sq

Construct sq object from character vector
substitute_letters

Substitute letters in a sequence
import_sq

Import sq objects from other objects
%has%

Test sq object for presence of given motifs
remove_ambiguous

Remove sequences that contain ambiguous elements
read_fasta

Read a FASTA file
export_sq

Export sq objects into other formats
find_invalid_letters

Find elements which are not suitable for specified type.
find_motifs

Find given motifs
alphabet

Get alphabet of given sq object.
get_sq_lengths

Get lengths of sequences in sq object
as.sq

Convert an object to sq
as.matrix.sq

Convert sq object into matrix