Learn R Programming

OriGen (version 1.3.1)

GenerateHeatMaps: Fit OriGen microsatellite allele frequency surfaces

Description

This function fits allele frequency surfaces to microsatellite data and then finds locations for unknown individuals..

Usage

GenerateHeatMaps(FitModelOutput,UnknownDataArray,NumberLoci,RestrictToLand=TRUE)

Arguments

FitModelOutput
This is the output from FitMultinomialModel.
UnknownDataArray
This is an array which gives the alleles for the individuals of unknown origin. The dimension of this array is [NumberUnknowns,2,NumberLoci], where 2 represents to 2 alleles each individual has at each locus. Note that these should not be allele lengths
NumberLoci
This integer value gives the number of loci to include when generating the heat maps. This is useful when generating heatmaps with multiple numbers of loci.
RestrictToLand
If TRUE, this logical parameter restricts the heat maps to land areas only.

Value

  • List with the following components:
  • AlleleFrequencySurfacesAn array giving the allele frequency for each allele, each coordinate, and each SNP. The dimension of this array is [MaxAlleles, NumberLoci, NumberLongitudeDivisions, NumberLatitudeDivisions], where either NumberLongitudeDivisions or NumberLatitudeDivisions is equal to MaxGridLength(an input to this function) and the other is scaled so that the geodesic distance between points horizontally and vertically is equal.
  • UnknownGridsAn array giving the probability that an unknown individual comes from the given location. The dimension of this array is [NumberLongitudeDivisions, NumberLatitudeDivisions, NumberUnknowns], where either NumberLongitudeDivisions or NumberLatitudeDivisions is equal to MaxGridLength(an input to this function) and the other is scaled so that the geodesic distance between points horizontally and vertically is equal.
  • DataArrayAn array giving the number alleles grouped by sample sites for each locus. The dimension of this array is [MaxAlleles,SampleSites,NumberSNPs].
  • RhoParameterA real value showing the inputted RhoParameter value.
  • SampleSitesThis shows the integer number of sample sites found.
  • GridLengthAn array giving the number of longitudinal and latitudinal divisions. The dimension of this array is [2], where the first number is longitude and the second is latitude.
  • MaxGridLengthAn integer giving the maximum number of boxes to fill the longer side of the region. Note that computation time increases quadratically as this number increases, but this number also should be high enough to separate different sample sites otherwise they will be binned together as a single site. This number was part of the inputs.
  • MaxAllelesThis shows the maximum of AllelesAtLocus. The maximum number of alleles at all loci.
  • NumberLociThis shows the integer number of loci found.
  • SampleCoordinatesThis is an array which gives the longitude and latitude of each of the found sample sites. The dimension of this array is [SampleSites,2], where the second dimension represents longitude and latitude respectively.
  • GridCoordinatesAn array showing the corresponding coordinates for each longitude and latitude division. The dimension of this array is [2,MaxGridLength], with longitude coordinates coming first and latitude second. Note that one of these rows may not be filled entirely. The associated output GridLength should be used to find the lengths of the two rows. Rows not filled in entirely will contain zeroes at the end.
  • AllelesAtLocusThis shows the integer vector of alleles found at each locus.
  • NumberUnknownsInteger number of unknown individuals found.
  • UnknownDataArrayThis is an array which gives the alleles for the individuals of unknown origin. The dimension of this array is [NumberUnknowns,2,NumberLoci], where 2 represents to 2 alleles each individual has at each locus. Note that these should not be allele lengths but rather the allele number matching the dimension in DataArray.

References

Ranola J, Novembre J, Lange K (2014) Fast Spatial Ancestry via Flexible Allele Frequency Surfaces. Bioinformatics, in press.

See Also

ConvertMicrosatData for converting Microsatellite data files into a format appropriate for analysis, ConvertPEDData for converting Plink PED files into a format appropriate for analysis,

FitMultinomialModel for fitting allele surfaces to the converted microsatellite data,

FitMultinomialModelFindUnknowns for fitting allele surfaces to the converted Microsatellite data,

PlotAlleleFrequencySurface for a quick way to plot the resulting allele frequency surfaces from FitOriGenModel or GenerateHeatMaps,;

Examples

Run this code
#Data generation
SampleSites=10
NumberLoci=4
MaxAlleles=4
NumberAllelesAtEachLocus=sample(2:MaxAlleles,NumberLoci,replace=TRUE)

TestData=array(0,dim=c(MaxAlleles,SampleSites,NumberLoci))
for(i in 1:NumberLoci){
	for(j in 1:NumberAllelesAtEachLocus[i]){
		TestData[j,,i]=sample(1:10,SampleSites,replace=TRUE)
	}
}

#Europe is about -9 to 38 and 34 to 60
TestCoordinates=array(0,dim=c(SampleSites,2))
TestCoordinates[,1]=runif(SampleSites,-9,38)
TestCoordinates[,2]=runif(SampleSites,34,60)

#Fitting the model
#MaxGridLength is the maximum number of boxes allowed to span the region in either direction
#RhoParameter is a tuning constant
NumberUnknowns=2
UnknownData=array(0,dim=c(NumberUnknowns,2,NumberLoci))
for(i in 1:NumberUnknowns){
	for(j in 1:NumberLoci){
		UnknownData[i,,j]=sample(1:NumberAllelesAtEachLocus[j],2)
	}
}

trials=FitMultinomialModel(TestData,TestCoordinates,MaxGridLength=20,RhoParameter=1)
str(trials)

trials2=GenerateHeatMaps(trials,UnknownData,NumberLoci=NumberLoci)

#Plotting the model
PlotUnknownHeatMap(trials2,UnknownNumber=1)

Run the code above in your browser using DataLab