dmSQTLdata: Create dmSQTLdata object

Description

Constructor functions for a dmSQTLdata object. dmSQTLdata requires that SNPs are already matched to corresponding genes. dmSQTLdataFromRanges does the matching by assigning to a gene all the SNPs that are located in a given surrounding (window) of this gene.

Usage

dmSQTLdata(counts, gene_id, feature_id, genotypes, gene_id_genotypes, snp_id, sample_id, BPPARAM = BiocParallel::MulticoreParam(workers = 1))
dmSQTLdataFromRanges(counts, gene_id, feature_id, gene_ranges, genotypes, snp_id, snp_ranges, sample_id, window = 5000, BPPARAM = BiocParallel::MulticoreParam(workers = 1))

Arguments

counts

Numeric matrix or data frame of counts. Rows represent features, for example, exons, exonic bins or transcripts. Columns represent samples.

gene_id

Vector of gene IDs corresponding to counts.

feature_id

Vector of feature IDs corresponding to counts.

genotypes

Numeric matrix with genotypes. Rows represent SNPs, columns represent samples. The genotype of each sample is coded in the following way: 0 for ref/ref, 1 for ref/not ref, 2 for not ref/not ref, -1 or NA for missing value.

gene_id_genotypes

Vector of gene IDs corresponding to genotypes.

snp_id

Vector of SNP IDs corresponding to genotypes.

sample_id

Vector of unique sample IDs corresponding to the columns in counts.

BPPARAM

Parallelization method used by bplapply.

gene_ranges

GRanges object with gene location. It must contain gene names when calling names().

snp_ranges

GRanges object with SNP location. It must contain SNP names when calling names().

window

Size of a down and up stream window, which is defining the surrounding for a gene. Only SNPs that are located within a gene or its surrounding are considered in the sQTL analysis.

Value

Returns a dmSQTLdata object.

Details

It is quite common that sample grouping defined by some of the SNPs is identical. Compare dim(genotypes) and dim(unique(genotypes)). In our sQTL analysis, we do not repeat tests for the SNPs that define the same grouping of samples. Each grouping is tested only once. SNPs that define such unique groupings are aggregated into blocks. P-values and adjusted p-values are estimated at the block level, but the returned results are extended to a SNP level by repeating the block statistics for each SNP that belongs to a given block.

Examples

Run this code

 
#############################
### Create dmSQTLdata object
#############################

# Use subsets of data defined in GeuvadisTranscriptExpr package
library(GeuvadisTranscriptExpr)

counts <- GeuvadisTranscriptExpr::counts
genotypes <- GeuvadisTranscriptExpr::genotypes
gene_ranges <- GeuvadisTranscriptExpr::gene_ranges
snp_ranges <- GeuvadisTranscriptExpr::snp_ranges

# Make sure that samples in counts and genotypes are in the same order
sample_id <- colnames(counts[, -(1:2)])

d <- dmSQTLdataFromRanges(counts = counts[, sample_id], 
   gene_id = counts$Gene_Symbol, feature_id = counts$TargetID, 
   gene_ranges = gene_ranges, genotypes = genotypes[, sample_id], 
   snp_id = genotypes$snpId, snp_ranges = snp_ranges, sample_id = sample_id, 
   window = 5e3, BPPARAM = BiocParallel::SerialParam())

plotData(d)

Run the code above in your browser using DataLab