Demerelate: Demerelate --- Algorithms to estimate pairwise relatedness within populations based on allele sharing

Description

Head function of Demerelate. This function should be called if any estimation of relatedness is intended. Additionally, some F-statistics can be calculated. Default parameters are set for convenient usage. Only an input dataframe containing allelic information is necessary. Geographical distances, reference populations or alterations on statistics can be set by adapting parameters.

Usage

Demerelate(inputdata, tab.dist = "NA", ref.pop = "NA",  object = FALSE, value = "Mxy", Fis = FALSE, file.output = FALSE, p.correct = FALSE, iteration = 1000, pairs = 1000,  dis.data = "relative", NA.rm = TRUE, genotype.ref = TRUE)

Arguments

inputdata

R object or external file to be read internally with standard Demerelate inputformat. Dataframe will be split by population information and calculations will run separately. If no reference population information is specified (ref.pop = "NA") all information on loci are used as reference by omitting population information.

tab.dist

R object or external file to be read internally with standard Demerelate inputformat. Geographic distances can be defined and will be analysed combined with genetic data. Column three and four of standard inputformat are used for x and y coordinates.

ref.pop

R object or external file to be read internally with standard Demerelate inputformat. Custom reference populations will be loaded for the analysis. Population information of reference file will be omitted so that allele frequencies are calculated from the whole dataset. Optionally allele frequencies can be loaded as reference: The object should be then a list of allele frequencies. For each locus a vector with allele frequencies p and allele names as vector names needs to be combined to a list. The last list object is a vector of sample sizes for each locus.

object

Information whether inputdata are objects or should be read in as files.

value

String defining method to calculate allele sharing or similarity estimates. Can be set as "Bxy", "Sxy", "Mxy", "Li", "lxy", "rxy", "loiselle", "wang.fin", "wang", "ritland", "morans.fin" or "morans" allele.sharing.

Fis

logical. Should $F_{is}$ values be calculated for each population?

iteration

Number of bootstrap iterations in $F_{is}$ calculations.

pairs

Number of pairs calculated from reference populations for randomized full siblings, half siblings and non related individuals.

file.output

logical. Should a cluster dendogram, histograms and .txt files be sent as standard output in your working directory. In some cases (inflating NA values) it may be necessary that this value has to be set as FALSE due to problems in calculating clusters on pairwise NA values.

p.correct

logical. Should Yates correction from prop.test(...) be used in $\chi^2$ statistics when calculating p-values on differences between empirical and randomized relatedness in populations.

dis.data

The kind of data to be used as distance measure. Can be "relative" - relative x and y coordinates should be given in tab.dist or "decimal" for geographic decimal degrees.

NA.rm

logical. If set as TRUE samples with NA in any position are removed from the calculation. If set as FALSE you may get an error message telling you to remove some individuals to run through the procedure. Always be aware that if your calculations are successful although you have NA values in your populations your may be biased by missing data.

genotype.ref

logical. If set as TRUE random non related populations are generated from genotypes of the reference population. If set as false allele frequencies are used for reference population generation. If ref.pop is given as list of allele frequencies genotype.ref = FALSE is forced.

Value

Function returns files in a folder named with a bar-code and date of analysis as follows if file.output is set as TRUE:Function returns via return following objects as one list:

Details

Pairwise relatedness is calculated from inputdata. Be sure to fit exactly the inputformat. Missing values are omitted when flagged as NA. If no additional reference populations are defined, inputdata omitting population information are used to calculate references. If no good reference populations are available you need to take care of bias in calculations. In any case you should consult for example Oliehoek et al. 2006 to get an idea of bias in relatedness calculations. Geographic distances between individual pairs are calculated when tab.dist = ... . Distances calculated from x-y coordinates by simple Pythagorean mathematics can be applied to any metrical positions in sampling. Geographic coordinates from e.g. GPS need to be transformed to decimal GPS coordinates. Be sure to have positions for each individual or remove missing values from inputdata. Each calculation will have its unique bar-code and is named with the date and population name. Calculations are performed for each population in the inputdata.

References

Armstrong, W. (2012) fts: R interface to tslib (a time series library in c++). by R package version 0.7.7. Blouin, M., Parsons, M., Lacaille, V. and Lotz, S. (1996) Use of microsatellite loci to classify indi- viduals by relatedness. Molecular Ecology, 5, 393-401. Hardy, O.J. and Vekemans, X. (1999) Isolation by distance in a contiuous population: reconciliation between spatial autocorrelation analysis and population genetics models. Heredity, 83, 145-154. Li, C.C., Weeks, D.E. and Chakravarti, A. (1993) Similarity of DNA fingerprints due to chance and relatedness. Human Heredity, 43, 45-52. Li, C.C. and Horvitz, D.G. (1953) Some methods of estimating the inbreeding coefficient. Ameri- can Journal of Human Genetics, 5, 107-17. Loiselle, B.A., Sork, V.L., Nason, J. and Graham, C. (1995) Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). American Journal of Botany, 82, 1420-1425. Lynch, M. (1988) Estimation of relatedness by DNA fingerprinting. Molecular Biology and Evolu- tion, 5(5), 584-599. Lynch, M. and Ritland, K. (1999) Estimation of pairwise relatedness with molecular markers. Ge- netics, 152, 1753-1766. Oliehoek, P. A. et al. (2006) Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics, 173, 483-496. Queller, D.C. and Goodnight, K.F. (1989) Estimating relatedness using genetic markers. Evolution, 43, 258-275. Ritland, K. (1999) Estimators for pairwise relatedness and individual inbreeding coefficients. Ge- netics Research, 67, 175-185. Wang, J. (2002) An estimator for pairwise relatedness using molecular markers. Genetics, 160, 1203-1215.

Examples

Run this code


     
     ## Data set is used to calculate Blouins allele sharing index on  
     ## population data. Pairs are set to 10 for convenience.
     ## For statistical reason for your final results you may want to 
     ## use more pairs to model relatedness (1000 pairs are recommended).

     data(demerelpop)
     
     dem.results <- Demerelate(demerelpop[,1:6], value="Mxy", 
                    file.output=FALSE, object=TRUE, pairs=10)


     ## Demerelate can be executed with several different values 
     ## should consult the references to decided which estimator may 
     ## be useful in your case. 
     ## Be careful some estimators may be biased in situations with
     ## no reference populations or violatin of Hardy-Weinberg
     ## Equilibrium.

Run the code above in your browser using DataLab