lfmm: Fitting Latent Factor Mixed Models

Description

lfmm is used to fit Latent Factor Mixed Models. The goal of lfmm is to identify genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures.

Usage

lfmm(input.file, environment.file, K,  project = "continue",  d = 0, all = FALSE,  missing.data = FALSE, CPU = 1,  iterations = 10000, burnin = 5000,  seed = -1, repetitions = 1,  epsilon.noise = 1e-3, epsilon.b = 1000,  random.init = TRUE)

Arguments

input.file

A character string containing a path to the input file, a genotypic matrix in the lfmm{lfmm_fomat} format.

environment.file

A character string containing a path to the environmental file, an environmental data matrix in the env format.

An integer corresponding to the number of latent factors.

project

A character string among "continue", "new", and "force". If "continue", the results are stored in the current project. If "new", the current project is removed and a new one is created to store the result. If "force", the results are stored in the current project even if the input file has been modified since the creation of the project.

An integer corresponding to the fit of lfmm model with the d-th variable only from environment.file. By default (if NULL and all is FALSE), fit lfmm with each variable from environment.file sequentially and independently.

all

A boolean option. If true, fit lfmm with all variables from environment.file at the same time. This option is not compatible with the d option.

missing.data

A boolean option. If true, the input.file contains missing genotypes.

iterations

The total number of iterations in the Gibbs Sampling algorithm.

burnin

The burnin number of iterations in the Gibbs Sampling algorithm.

seed

A seed to initialize the random number generator. By default, the seed is randomly chosen. The seed is initialized at each repetition. If you want to set a seed, please provide a seed per repetition.

repetitions

The number of repetitions of each run.

epsilon.noise

Prior on the different variances.

epsilon.b

Prior on the variance of the correlation coefficients.

random.init

A boolean option. If true, the Gibbs Sampler is initiliazed randomly. Otherwise, it is initialized with zeros.

Value

show: Display information about the analyses.
summary: Summarize the analyses.
z.scores: Return the lfmm output vector of zscores for the chosen runs with K latent factors, the d-th variable and the all option.
p.values: Return the lfmm output vector of p-values for the chosen runs with K latent factors, the d-th variable and the all option.
adjusted.pvalues: Return the output vector of adjusted p-values using the genomic control method or the provided lambda inflation factor for the chosen runs with K latent factors, the d-th variable and the all option.
mlog10p.values: Return the lfmm output vector of -log10(p-values) for the chosen runs with K latent factors, the d-th variable and the all option.
load.lfmmProject (file = "character"): Load the file containing an lfmmProject objet and return the lfmmProject object.
remove.lfmmProject (file = "character"): Erase a lfmmProject object. Caution: All the files associated with the object will be removed.
export.lfmmProject(file.lfmmProject): Create a zip file containing the full lfmmProject object. It allows to move the project to a new directory or a new computer (using import). If you want to overwrite an existing export, use the option force == TRUE.
import.lfmmProject(file.lfmmProject): Import and load an lfmmProject object from a zip file (made with the export function) into the chosen directory. If you want to overwrite an existing project, use the option force == TRUE.
combine.lfmmProject(file.lfmmProject, toCombine.lfmmProject): Combine to.Combine.lfmmProject into file.lfmmProject. Caution: Only projects with runs coming from the same input file can be combined. If the same input file has different names in the two projects, use the option force == TRUE.

References

Frichot E, Schoville SD, Bouchard G, Francois O. (2013). Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular biology and evolution, 30(7), 1687-1699.

Examples

Run this code

### Example of analyses using lfmm ###

data("tutorial")
# creation of the genotype file, genotypes.lfmm.
# It contains 400 SNPs for 50 individuals.
write.lfmm(tutorial.R, "genotypes.lfmm")
# creation of the environment file, gradient.env.
# It contains 1 environmental variable for 40 individuals.
write.env(tutorial.C, "gradients.env")

################
# runs of lfmm #
################

# main options, K: (the number of latent factors), 
#           CPU: the number of CPUs.

# Runs with K = 9 and 5 repetitions.
# The runs are composed of 6000 iterations including 3000 iterations
# for burnin.
# around 30 seconds per run.
project = NULL
project = lfmm("genotypes.lfmm", "gradients.env", K = 6, repetitions = 5, 
        project = "new")

# get the adjusted p-values using the genomic control method
res = adjusted.pvalues(project, K = 6)

for (alpha in c(.05,.1,.15,.2)) {
    # expected FDR
    print(paste("expected FDR:", alpha))
    L = length(res$p.values)
    # return a list of candidates with an expected FDR of alpha.
    w = which(sort(res$p.values) < alpha * (1:L) / L)
    candidates = order(res$p.values)[w]

    # estimated FDR and True Positif
    estimated.FDR = length(which(candidates <= 350))/length(candidates)
    estimated.TP = length(which(candidates > 350))/50
    print(paste("FDR:", estimated.FDR, "True Positive:", estimated.TP))
}

###################
# Post-treatments #
###################

# show the project
show(project)

# summary of the project
summary(project)

# get the z-scores for the 2nd run for K = 6
z = z.scores(project, K = 6, run = 2)

# get the p-values for the 2nd run for K = 6
p = p.values(project, K = 6, run = 2)

# get the adjusted p-values for for K = 6
res = adjusted.pvalues(project, K = 6)

# get the -log10(p-values) for the 2nd run for K = 6
mp = mlog10p.values(project, K = 6, run = 2)

##########################
# Manage an lfmm project #
##########################

# All the runs of lfmm for a given file are 
# automatically saved into a lfmm project directory and a file.
# The name of the lfmmProject file is a combination of 
# the name of the input file and the environment file 
# with a .lfmmProject extension ("genotypes_gradient.lfmmProject").
# The name of the lfmmProject directory is the same name as
# the lfmmProject file with a .lfmm extension ("genotypes_gradient.lfmm/")
# There is only one lfmm Project for each input file including all the runs.

# An lfmmProject can be load in a different session.
project = load.lfmmProject("genotypes_gradients.lfmmProject")

# An lfmmProject can be exported to be imported in another directory
# or in another computer
export.lfmmProject("genotypes_gradients.lfmmProject")

 windows
dir.create("test", showWarnings = TRUE)
#import
newProject = import.lfmmProject("genotypes_gradients_lfmmProject.zip", "test")

# combine projects
combinedProject = combine.lfmmProject("genotypes_gradients.lfmmProject", "test/genotypes_gradients.lfmmProject")

# remove
remove.lfmmProject("test/genotypes_gradients.lfmmProject")

 windows
# remove
remove.lfmmProject("genotypes_gradients.lfmmProject")

#import
newProject = import.lfmmProject("genotypes_gradients_lfmmProject.zip")


# An lfmmProject can be erased.
# Caution: All the files associated with the project will be removed.
remove.lfmmProject("genotypes_gradients.lfmmProject")

Run the code above in your browser using DataLab