ClusterFind: ClusterFind

Description

ClusterFind is the main method of the iPAC package. It identifies clusters of mutated amino acids while taking into account the protein structure.

Usage

ClusterFind(mutation.data, position.data, method = "MDS", alpha = 0.05, 
			MultComp = "Bonferroni", Include.Culled = "Y", Include.Full = "Y", 
			create.map = "Y", Show.Graph = "Y", Graph.Output.Path = NULL,
			Graph.File.Name = "Map.pdf", Graph.Title = "Mapping", 
			OriginX = min(position.data[, 4]), OriginY = min(position.data[, 5]),
			OriginZ = min(position.data[, 6]))

Arguments

mutation.data

A matrix of 0's (no mutation) and 1's (mutation) where each column represents an amino acid in the protein and each row represents an individual sample (test subject, cell line, etc). Thus if column i in row j had a 1, that would mean that the ith amino acid for person j had a nonsynonomous mutation.

position.data

A dataframe consisting of five columns: 1) Residue Name, 2) Amino Acid number in the protein, 3) Side Chain, 4) X-coordinate, 5) Y-coordinate and 6) Z-coordinate. Please see get.Positions and get.AlignedPositions for further information on how to construct this matrix.

method

You can select whether you want a "MDS" or "Linear" approach in order to map the protein into a 1D space.

alpha

The significance level used in the NMC calculation. Please see Ye. et. al. for more information.

MultComp

The multiple comparisons adjustment used in the NMC calculation. Possible options are "None", "Bonferroni" and "BH". Please see Ye. et. al. for more information.

Include.Culled

If "Y", the standard NMC algorithm will be run on the protein after removing the amino acids for which there is no positional data.

Include.Full

If "Y", the standard NMC algorithm will be run on the full protein sequence.

create.map

If "Y", a graphical representation of the the dimension reduction from 3D to 1D space will be created (though not necessarily displayed).

Show.Graph

If "Y", the graph representation will be displayed. Warning: You must be running R in a GUI environment, otherwise, an error will occur.

Graph.Output.Path

If you would like the picture saved atomatically to the disk, specify the output directory here. The Graph.File.Name variable must be set as well.

Graph.File.Name

If you would like the picture saved automatically to the disk, specify the output file name. The Graph.Output.Path variable must be set as well.

Graph.Title

The title of the graph to be created.

OriginX

If the "Linear" method is chosen, this specifies the x-coordinate part of the fixed point.

OriginY

If the "Linear" method is chosen, this specifies the y-coordinate part of the fixed point.

OriginZ

If the "Linear" method is chosen, this specifies the z-coordinate part of the fixed point.

Value

Remapped: This shows the clusters found while taking the 3D structure into account.
OriginalCulled: This shows the clusters found if you run the NMC algorithm on the canonical linear protein, but with the amino acids for which we don't have 3D positional data removed.
Original: This shows the clusters found if you run the NMC algorithn on the canonical linear protein with all the amino acids.
MissingPositions: This shows which amino acids are present in the mutation matrix but for which we do not have positions. These amino acids are cut from the protein when calculating the Remapped and OriginalCulled results.

Details

The linear method fixes a point, defined by the parameters OriginX, OriginY, OriginZ, and then calculates the distance from each amino acid to that point. The graph produced by ClusterFind (if requested), shows these distances as dotted green lines. The length of the green line is used to reorder the protein, with the amino acid that corresponds to the shortest green line being ordered first and the amino acid corresponding to the longest green line being ordered last. Additional methods will be available in future versions of this package.

References

Ye et. al., Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics. 2010. doi:10.1186/1471-2105-11-11.

Examples

Run this code

#Extract the data from a CIF file and match it up with the canonical protein sequence.
#Here we use the 3GFT structure from the PDB, which corresponds to the KRAS protein.
CIF<-"http://www.pdb.org/pdb/files/3GFT.cif"
Fasta<-"http://www.uniprot.org/uniprot/P01116-2.fasta"
KRAS.Positions<-get.Positions(CIF,Fasta, "A")

#Load the mutational data for KRAS. Here the mutational data was obtained from the
#COSMIC database (version 58). 
data(KRAS.Mutations)

#Identify and report the clusters using the default MDS method.
ClusterFind(mutation.data=KRAS.Mutations, 
							position.data=KRAS.Positions$Positions,
							create.map = "Y",Show.Graph = "Y")
							
#Identify and report the clusters using the linear method.	
ClusterFind(mutation.data=KRAS.Mutations, 
							position.data=KRAS.Positions$Positions,
							create.map = "Y",Show.Graph = "Y", method = "Linear")

Run the code above in your browser using DataLab