Learn R Programming

gaston (version 1.4.9)

LD.thin: LD thinning

Description

Select SNPs in LD below a given threshold.

Usage

LD.thin(x, threshold, max.dist = 100e3, beg = 1, end = ncol(x), which.snps, dist.unit = c("bases", "indices"), extract = TRUE, keep = c("left", "right", "random"))

Arguments

threshold
The maximum LD (measured by $r^2$) between SNPs
max.dist
The maximum distance for which the LD is computed
beg
The index of the first SNP to consider
end
The index of the last SNP to consider
which.snps
Logical vector, giving which SNPs are considerd. The default is to use all SNPs
dist.unit
Distance unit in max.dist
extract
A logical indicating whether the function return a bed.matrix (TRUE) or a logical vector indicating which SNPs are selected (FALSE)
keep
Which SNP is selected in a pair with LD above threshold

Value

If extract = TRUE, a bed.matrix extracted from x with SNPs in pairwise LD below the given threshold. If extract = FALSE, a logical vector of length end - beg + 1, where TRUE indicates that the corresponding SNPs is selected.

Details

The SNPs to keep are selected by a greedy algorithm. The LD is computed only for SNP pairs for which distance is inferior to max.dist, expressed in number of bases if dist.unit = "bases", or in number of SNPs if dist.unit = "indices". The argument which.snps allows to consider only a subset of SNPs.

The algorithm tries to keep the largest possible number of SNPs: it is not appropriate to select tag-SNPs.

See Also

LD

Examples

Run this code
# Load data
data(TTN)
x <- as.bed.matrix(TTN.gen, TTN.fam, TTN.bim)

# Select SNPs in LD r^2 < 0.4, max.dist = 500 kb
y <- LD.thin(x, threshold = 0.4, max.dist = 500e3)
y

# Verifies that there is no SNP pair with LD r^2 > 0.4
# (note that the matrix ld.y has ones on the diagonal)
ld.y <- LD( y, lim = c(1, ncol(y)) )
sum( ld.y > 0.4 )  

Run the code above in your browser using DataLab