Learn R Programming

vanddraabe (version 1.1.1)

CleanProteinStructures: Clean Protein Structures

Description

Removes hydrogen and modeled atoms from a RCSB/PDB structure along with waters beyond a user defined distance from protein atoms.

Usage

CleanProteinStructures(prefix = "./alignTesting",
  CleanHydrogenAtoms = TRUE, CleanModeledAtoms = TRUE,
  cutoff.prot.h2o.dist = 6, min.num.h2o = 20,
  cleanDir = "ProteinSystem", filename = "ProteinSystem")

Arguments

prefix

The directory with the PDB files to be cleaned

CleanHydrogenAtoms

A logical indication if hydrogen atoms should be removed; default: TRUE

CleanModeledAtoms

A logical indication if modeled atoms should be removed; default: TRUE

cutoff.prot.h2o.dist

A numerical value setting the maximum distance between a protein atom (heteroatoms are ignored) and water oxygen atoms. The oxygen atoms equal to or less than this distance are retained; default: 6.0 Angstroms

min.num.h2o

Minimum number of water oxygen atoms within a protein structure for it to be included in the conserved water analysis; default: 20

cleanDir

A character string for the "cleaned" PDB structures to be written. The provided character string are appended with "_CLEANED"; default: "ProteinSystem"

filename

The filename prefix for the returned results. Default is "ProteinSystem"

Value

The following data is returned:

  • cleaning.summary: summary indicating

    • if hydrogen atoms were removed TRUE/FALSE

    • number of out of range atoms for B-values and occupancy values

    • number of modeled (and thus removed)

    • number of atoms NOT modeled (and thus retained)

    • number of water oxygen atoms beyond the user defined cutoff

    • the number of water oxygen atoms within the user defined cutoff.

  • Bvalue.counts: binned B-value values with binwidths = 5 (0 to 100)

  • normBvalue.counts: binned normalized B-value values with binwidths = 0.1 (-4 to 6)

  • occupancy.counts: binned occupancy values with binwidths = 0.1 (0 to 1)

  • mobility.counts: binned mobility values with binwidths = 0.1 (0 to 6)

  • Excel workbook: containing the cleaning.summary, Bvalue.counts, normBvalue.counts, occupancy.counts, and mobility.counts data as individual tabs

  • PDBids.retained: a vector of PDBids

  • call: parameters provided by the user

Details

PDB files obtained from the PDB conform to a specific set of formatting standards but this does not mean the data within the PDB files is always correct. This function cleans the PDB file and summaries the atom evaluations.

This function does the following (in this order):

  • Reads in the PDB file

  • Adds/updates the element symbol (elesy) using the atom type (elety) via the bio3d::atom2ele() function

  • Removes hydrogen atoms via RemoveHydrogenAtoms() (user option)

  • Removes atoms with occupancy values determined to be out of range (OoR) via RemoveOoR.o()

  • Removes atoms with B-values determined to be out of range (OoR) via RemoveOoR.b()

  • Bins (counts) the occupancy values

  • Bins (counts) the B-values

  • Bins (counts) the normalized B-values

  • Bins (counts) the mobility values

  • Removes modeled atoms via RemoveModeledAtoms() (user option)

  • Removes water oxygen atoms greater than user defined value cutoff.prot.h2o.dist from the protein via RetainWatersWithinX() (user option)

  • Writes cleaned protein structure to a PDB file

See Also

Other "Clean Protein Structure": RemoveHydrogenAtoms, RemoveModeledAtoms, RemoveOoR.b, RemoveOoR.o, RetainWatersWithinX