Demerelate: Demerelate --- Algorithms to estimate pairwise relatedness within populations based on allele sharing

Description

Head function of Demerelate. This function should be called if any estimation of relatedness is intended. Additionally, some F statistics can be calculated. Default parameters are set for convenient usage. Only an input dataframe containing allelic information is necessary. Geographical distances, reference populations or alterations on statistics can be set by adapting parameters.

Usage

Demerelate(inputdata, tab.dist = "NA", ref.pop = "NA", 
                object = FALSE, value = "Mxy", Fis = FALSE,
                file.output = FALSE, p.correct = FALSE,
                iteration = 1000, pairs = 1000, 
                dis.data = "relative", NA.rm =TRUE)

Arguments

inputdata

R object or external file to be read internally with standard Demerelate inputformat. Dataframe will be split by population information and calculations will run separately. If no reference population information is

tab.dist

R object or external file to be read internally with standard Demerelate inputformat. Geographic distances can be defined and will be analysed combined with genetic data. Column three and four of standard inputforma

ref.pop

R object or external file to be read internally with standard Demerelate inputformat. Custom reference populations will be loaded for the analysis. Population information of reference file will be omitted so that al

object

Information whether inputdata are objects or should be read in as files.

value

String defining method to calculate allele sharing or similarity estimates. Can be set as "rxy", "Bxy" or "Mxy".

Fis

logical. Should $F_{is}$ values be calculated for each population?

iteration

Number of bootstrap iterations in $F_{is}$ calculations.

pairs

Number of pairs calculated from reference populations for randomized full siblings, half siblings and non related individuals.

file.output

logical. Should a cluster dendogram, histograms and .txt files be sent as standard output in your working directory. In some cases (inflating NA values) it may be necessary that this value has to be set as FALSE due to problems in calculating clu

p.correct

logical. Should Yates correction from prop.test(...) be used in $\chi^2$ statistics when calculating p-values on differences between empirical and randomized relatedness in populations.

dis.data

The kind of data to be used as distance measure. Can be "relative" - relative x and y coordinates should be given in tab.dist or "decimal" for geographic decimal degrees.

NA.rm

logical. If set as TRUE samples with NA in any position are removed from the calculation. If set as FALSE you may get an error message telling you to remove some individuals to run through the procedure. Always be aware that if your calculations are succe

Value

Function returns files in a folder named with a bar-code and date of analysis as follows if file.output is set as TRUE:
Empirical.relatedness.Population.txtMatrix of relatedness values for each population.
Geographic.distances.Population.txtMatrix of geographic distances for each population.
Relate.mean.Populationout.name.txtDepends on selected estimators and mode of analysis. Either a summary of correlation of relatedness with geographic distance for each population or a summary of tests for relatedness within populations compared to reference populations is written to the file.
Random.Halfsib.distances.overall.txtMatrix of relatedness values calculated from randomized reference population for half siblings.
Random.NonRelated.distances.overall.txtMatrix of relatedness values calculated from randomized reference population for non related individuals.
Random.Fullsib.distances.overall.txtMatrix of relatedness values calculated from randomized reference population for full siblings.
Cluster.Populationout.name.pdfContaining an UPGMA cluster dendogram of relatedness values and a histogram of relatedness values per locus and for loci overall.
Total-Regression.Population.pdfContaining regression plot and linear fit for geographic distance and genetic relatedness.
Summary.Populationout.name.txtSummary of analysis of F statistics and allele/genotype frequencies.
Function returns via return following objects as one list:
dem.results[[1]]Settings of the calculation are passed to this list object.
dem.results[[2]]Mean relatedness for empirical population over all loci.
dem.results[[3]]Summarized relatedness statistics with thresholds and randomized populations from the dataset.
dem.results[[4]]Statistical analysis of the number of siblings found for each population.
dem.results[[5]]Thresholds for relatedness if "Bxy" or "Mxy" are selected as estimators
dem.results[[6]]$F_{is}$ values and statistics for each population if Fis==TRUE
dem.results[[7]]Summary of linear regression of distance data are provided.

Details

Pairwise relatedness is calculated from inputdata. Be sure to fit exactly the inputformat. Missing values are omitted when flagged as NA. If no additional reference populations are defined, inputdata omitting population information are used to calculate references. If no good reference populations are available you need to take care of bias in calculations. In any case you should consult for example Oliehoek et al. 2006 to get an idea of bias in relatedness calculations. Geographic distances between individual pairs are calculated when tab.dist = ... . Distances calculated from x-y coordinates by simple Pythagorean mathematics can be applied to any metrical positions in sampling. Geographic coordinates from e.g. GPS need to be transformed to decimal GPS coordinates. Be sure to have positions for each individual or remove missing values from inputdata. Each calculation will have its unique bar-code and is named with the date and population name. Calculations are performed for each population in the inputdata.

References

Blouin, M.S. et al. (1996) Use of microsatellite loci to classify individuals by relatedness. Molecular Ecology, 5, 393-401. Li C.C. and Horvitz D.G. (1953) Some methods of estimating the inbreeding coefficient. American Journal of Human Genetics 5, 107-17. Oliehoek, P. A. et al. (2006) Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics, 173, 483-496. Queller, D.C. and Goodnight, K.F. (1989) Estimating relatedness using genetic markers. Evolution, 43, 258-275.

Examples

Run this code

## Data set is used to calculate Blouins allele sharing index on  
     ## population data. Pairs are set to 10 for convenience.
     ## For statistical reason for your final results you may want to 
     ## use more pairs to model relatedness (1000 pairs are recommended).

     data(demerelpop)
     
     dem.results <- Demerelate(demerelpop[,1:6], value="Mxy", 
                    file.output=FALSE, object=TRUE, pairs=10)


     ## Demerelate can be executed with values Bxy, rxy and Mxy you
     ## should consult the references to decided which estimator may 
     ## be useful in your case. 
     ## Be careful with Bxy this estimator may be biased and should be
     ## used with caution. You may want to use rxy instead.

Run the code above in your browser using DataLab