Learn R Programming

peplib (version 1.03)

read.sequences: Read sequence file

Description

This will read in a file of sequences and create a Sequences-class object.

Usage

read.sequences(file, header = FALSE, sep = "",
quote = """, dec = ".", fill = FALSE,
comment.char = "", alphabet = aabet)

Arguments

file
the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, g
header
a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the
sep
the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table) the separator is white space, that is one or mo
quote
the set of quoting characters. To disable quoting altogether, use quote = "". See scan for the behavior on quotes embedded in quotes. Quoting is only considered for columns
dec
the character used in the file for decimal points.
fill
logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added.
comment.char
character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.
alphabet
The alphabet to use for the sequences. The default alphabet contains the canonical 20 amino acids, as well as B, Z, X, and -, where X is an unspecified residue and - is a gap.

Value

  • An object of class Sequences. This is a small extension of the matrix class, and as expected, each row of the matrix corresponds to a single sequence. The sequences are always represented as integers. The rownames of the matrix are the original string/character representations of the sequences.

Warning

As mentioned above, it is important to align your sequences before reading if they are of differing lengths. For example, use the online ClustalW2 algorithm available from the European Bioinformatics Institutes's website.

Details

See the details for read.table for more information about reading the file. The file format should be one sequence per line with no spaces between the residues. Note that your file could contain additional columns with experimental data, but the first column must contain the sequences. For example: ------------------------ Sequences Response GFF- 1.0 RGD- 0.85 GRFD 10.1 -GFR 0.1 FGRD 5.2 ------------------ Each sequence should be the same length, thus - should be use to pad the sequences, as seen in the example. Use an alignment algorithm, such as Clustal, to align your sequences before reading.

See Also

Sequences, read.fasta