Learn R Programming

bio3d (version 2.3-0)

read.pdb: Read PDB File

Description

Read a Protein Data Bank (PDB) coordinate file.

Usage

read.pdb(file, maxlines = -1, multi = FALSE, rm.insert = FALSE, rm.alt = TRUE, ATOM.only = FALSE, hex = FALSE, verbose = TRUE)
read.pdb2(file, maxlines = -1, multi = FALSE, rm.insert = FALSE, rm.alt = TRUE, ATOM.only = FALSE, verbose = TRUE)
"print"(x, printseq=TRUE, ...)
"summary"(object, printseq=FALSE, ...)

Arguments

file
a single element character vector containing the name of the PDB file to be read, or the four letter PDB identifier for online file access.
maxlines
the maximum number of lines to read before giving up with large files. By default if will read up to the end of input on the connection.
multi
logical, if TRUE multiple ATOM records are read for all models in multi-model files and their coordinates returned.
rm.insert
logical, if TRUE PDB insert records are ignored.
rm.alt
logical, if TRUE PDB alternate records are ignored.
ATOM.only
logical, if TRUE only ATOM/HETATM records are stored. Useful for speed enhancements with large files where secondary structure, biological unit and other remark records are not required.
hex
logical, if TRUE enable parsing of hexadecimal atom numbers (> 99.999) and residue numbers (> 9.999) (e.g. from VMD). Note that numbering is assumed to be consecutive (with no missing numbers) and the hexadecimals should start at atom number 100.000 and residue number 10.000 and proceed to the end of file.
verbose
print details of the reading process.
x
a PDB structure object obtained from read.pdb.
object
a PDB structure object obtained from read.pdb.
printseq
logical, if TRUE the PDB ATOM sequence will be printed to the screen. See also pdbseq.
...
additional arguments to ‘print’.

Value

Returns a list of class "pdb" with the following components:

Details

read.pdb is a re-implementation (using Rcpp) of the slower but more tested R implementation of the same function (called read.pdb2 since bio3d-v2.3). maxlines may be set so as to restrict the reading to a portion of input files. Note that the preferred means of reading large multi-model files is via binary DCD or NetCDF format trajectory files (see the read.dcd and read.ncdf functions).

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696. For a description of PDB format (version3.3) see: http://www.wwpdb.org/documentation/format33/v3.3.html.

See Also

atom.select, write.pdb, trim.pdb, cat.pdb, read.prmtop, as.pdb, read.dcd, read.ncdf, read.fasta.pdb, read.fasta, biounit

Examples

Run this code
## Read a PDB file from the RCSB online database
#pdb <- read.pdb("4q21")

## Read a PDB file from those included with the package
pdb <- read.pdb( system.file("examples/1hel.pdb", package="bio3d") )

## Print a brief composition summary
pdb

## Examine the storage format (or internal *str*ucture)
str(pdb)

## Print data for the first four atom
pdb$atom[1:4,]

## Print some coordinate data
head(pdb$atom[, c("x","y","z")])

## Or coordinates as a numeric vector
#head(pdb$xyz)

## Print C-alpha coordinates (can also use 'atom.select' function)
head(pdb$atom[pdb$calpha, c("resid","elety","x","y","z")])
inds <- atom.select(pdb, elety="CA")
head( pdb$atom[inds$atom, ] )

## The atom.select() function returns 'indices' (row numbers)
## that can be used for accessing subsets of PDB objects, e.g.
inds <- atom.select(pdb,"ligand")
pdb$atom[inds$atom,]
pdb$xyz[inds$xyz]

## See the help page for atom.select() function for more details.


## Not run: 
# ## Print SSE data for helix and sheet,
# ##  see also dssp() and stride() functions
# print.sse(pdb)
# pdb$helix
# pdb$sheet$start
#   
# ## Print SEQRES data
# pdb$seqres
# 
# ## SEQRES as one letter code
# aa321(pdb$seqres)
# 
# ## Where is the P-loop motif in the ATOM sequence
# inds.seq <- motif.find("G....GKT", pdbseq(pdb))
# pdbseq(pdb)[inds.seq]
# 
# ## Where is it in the structure
# inds.pdb <- atom.select(pdb,resno=inds.seq, elety="CA")
# pdb$atom[inds.pdb$atom,]
# pdb$xyz[inds.pdb$xyz]
# 
# ## View in interactive 3D mode
# #view(pdb)
# ## End(Not run)

Run the code above in your browser using DataLab