Learn R Programming

MatrixEQTL (version 1.6.2)

Matrix_eQTL_main: Perform eQTL analysis.

Description

Matrix_eQTL_engine function tests associations between every row of the snps dataset and every row of the gene dataset using either linear additive or ANOVA model, as defined by useModel parameter. The testing procedure accounts for extra covariates in cvrt dataset. The constant is always included in the model and should not be explicitly specified in cvrt. The parameter errorCovariance can be set to the error variance-covariance matrix to account for heteroskedastic and/or correlated errors. Associations significant at pvOutputThreshold are saved to output_file_name, with corresponding test statistics, p-values, and estimated false discovery rate. Matrix eQTL can perform separate analysis for local (cis) and distant (trans) eQTLs. Extra parameters for such analysis are available in Matrix_eQTL_main. A gene-SNP pair is considered local if the distance between them is less than cisDist. The genomic location of genes and SNPs is defined by variables snpspos and {genepos}. The type of analysis is defined by p-value thresholds pvOutputThreshold and pvOutputThreshold.cis:
  1. SetpvOutputThreshold > 0andpvOutputThreshold.cis = 0, or useMatrix_eQTL_engineto perform eQTL analysis without regard to gene/SNP location. Associations significant atpvOutputThresholdlevel will be recorded inoutput_file_name.
  2. SetpvOutputThreshold = 0andpvOutputThreshold.cis > 0to perform eQTL analysis for local gene-SNP pairs only. Local associations significant atpvOutputThreshold.cislevel will be recorded inoutput_file_name.cis.
  3. SetpvOutputThreshold > 0andpvOutputThreshold.cis > 0to perform eQTL analysis with separate p-value thresholds for local and distant eQTLs. Distant and local associations significant at corresponding thresholds are recorded inoutput_file_nameandoutput_file_name.cisrespectively. In this case the false discovery rate is calculated separately for these two groups of eQTLs.
Note that Matrix_eQTL_engine is a wrapper for Matrix_eQTL_main provided for easier eQTL analysis without regard to gene/SNP location and for compatibility with the previous versions of this package. There are three linear regression models currently supported by Matrix eQTL as defined by the useModel parameter:
  1. SetuseModeltomodelLINEARto model the effect of the genotype as additive linear and test for its significance using t-statistic.
  2. UsemodelANOVAto treat genotype as a categorical variables and use ANOVA model and test for its significance using F-test. Note that no more than three distinct values per genotype variable is supported.
  3. The new special codemodelLINEAR_CROSSadds a new term to the model equal to the product of genotype and the last covariate; the significance of this term is then tested using t-statistic.

Usage

Matrix_eQTL_main(	
                   snps, 
                   gene, 
                   cvrt = SlicedData$new(), 
                   output_file_name = "", 
                   pvOutputThreshold = 1e-5,
                   useModel = modelLINEAR, 
                   errorCovariance = numeric(), 
                   verbose = TRUE, 
                   output_file_name.cis = "", 
                   pvOutputThreshold.cis = 0,
                   snpspos = NULL, 
                   genepos = NULL,
                   cisDist = 1e6,
                   pvalue.hist = FALSE)

Matrix_eQTL_engine(
                   snps, 
                   gene, 
                   cvrt = SlicedData$new(), 
                   output_file_name, 
                   pvOutputThreshold = 1e-5, 
                   useModel = modelLINEAR, 
                   errorCovariance = numeric(), 
                   verbose = TRUE,
                   pvalue.hist = FALSE)

Arguments

snps
SlicedData object with genotype information. Can be real-valued for linear model and should take up to 3 distinct values for ANOVA (see useModel parameter).
gene
SlicedData object with gene expression information. Should have columns matching those of snps.
cvrt
SlicedData object with additional covariates. Can be an empty SlicedData object in case of no covariates. The columns must match those in snps and gene.
output_file_name
character string with the name of the output file. Significant (all or distant) associations are saved to this file. If the file with this name exists, it will be overwritten.
output_file_name.cis
character string with the name of the output file. Significant local associations are saved to this file. If the file with this name exists, it will be overwritten.
pvOutputThreshold
numeric. Only gene-SNP pairs significant at this level will be saved in output_file_name.
pvOutputThreshold.cis
Same as pvOutputThreshold, but for local eQTLs. If both thresholds are positive, pvOutputThreshold determines cut-off for distant (trans) eQTLs.
useModel
numeric. Can be modelLINEAR, modelANOVA, or modelLINEAR_CROSS. See the section above for description.
errorCovariance
numeric. The error covariance matrix. Use numeric() for homoskedastic independent errors.
verbose
logical. Set to TRUE to display detailed report on the progress.
snpspos
data.frame object with information about SNP locations, must have 3 columns - SNP name, chromosome, and position.
genepos
data.frame with information about transcript locations, must have 4 columns - the name, chromosome, and positions of the left and right ends.
cisDist
numeric. SNP-gene pairs within this distance are considered local. The distance is measured from the nearest end of the gene.
pvalue.hist
This parameter defines how the distribution of (all/local/distant) p-values is recorded. If pvalue.hist is FALSE, the information is not recorded and thus the analysis is performed faster. Set pvalue.hist = "qqplot"

Value

  • The detected eQTLs are saved in output_file_name and/or output_file_name.cis. The method also returns a list with a summary of the performed analysis.
  • paramKeeps all input parameters.
  • time.in.secTime required for the performed analysis (in seconds).
  • allInformation about detected eQTLs for the analysis not using genomic locations.
  • cisInformation about detected local eQTLs.
  • transInformation about detected distant eQTLs.

Details

Note that the columns of gene, snps, and cvrt must match. If they do not match in the input files, use ColumnSubsample method to subset and/or reorder them.

References

For more information visit: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/

See Also

For more information on the class of the first three arguments see SlicedData.

Examples

Run this code
# Number of columns (samples)
n = 100;

# Genetate single genotype variable
snps.mat = rnorm(n);

# Generate single expression variable
gene.mat = 0.5*snps.mat + rnorm(n);

# Create 3 SlicedData objects for the analysis
snps1 = SlicedData$new( matrix( snps.mat, nrow = 1 ) );
gene1 = SlicedData$new( matrix( gene.mat, nrow = 1 ) );
cvrt1 = SlicedData$new();

# name of temporary output file
filename = tempfile();

# Call the main analysis function
me = Matrix_eQTL_main(
    snps = snps1, 
    gene = gene1, 
    cvrt = cvrt1, 
    filename, 
    pvOutputThreshold = 1, 
    useModel = modelLINEAR, 
    errorCovariance = numeric(), 
    verbose = TRUE,
    pvalue.hist = TRUE );
# remove the output file
unlink( filename );

# Pull Matrix eQTL results - t-statistic and p-value
tstat = me$all$eqtls[ 1, 3 ];
pvalue = me$all$eqtls[ 1, 4 ];
rez = c( tstat = tstat, pvalue = pvalue)
# And compare to those from linear regression
{
    cat("Matrix eQTL: 
"); 
    print(rez);
    cat("R summary(lm()) output: 
")
    lmout = summary(lm(gene.mat ~ snps.mat))$coefficients[2, 3:4];
    print(lmout)
}

Run the code above in your browser using DataLab