get.AlignedPositions(CIF.File.Location, Fasta.File.Location, chain.required = "A",
RequiredModelNum = NULL, patternQuality = PhredQuality(22L),
subjectQuality = PhredQuality(22L), type = "global-local",
substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = -10,
gapExtension = -4, scoreOnly = FALSE)
This method extracts the canonical amino acid sequence from the file at Fasta.File.Location. It then attempts to align the amino acids extracted from the CIF file to the canonical sequence using the pairwiseAlignment function in the package Biostrings that is available on Bioconductor. After alignment, any amino acids that are mismatched between the canonical sequence and the extracted sequence are automatically removed so that the ClusterFind method, which requires positional data as input, is only run on those amino acids which are correctly matched.
#Observe that position 61 is missing. It is atuomatically dropped as the pdb data
#specifies it as a "H" while the FASTA sequence specifies it as "Q".
CIF<-"http://www.pdb.org/pdb/files/3GFT.cif"
Fasta<-"http://www.uniprot.org/uniprot/P01116-2.fasta"
get.AlignedPositions(CIF,Fasta, "A")
Run the code above in your browser using DataLab