sleuth_results: Extract Wald or Likelihood Ratio test results from a sleuth object

Description

This function extracts Wald or Likelihood Ratio test results from a sleuth object.

Usage

sleuth_results(obj, test, test_type = "wt", which_model = "full",
  rename_cols = TRUE, show_all = TRUE,
  pval_aggregate = obj$pval_aggregate, ...)

Arguments

obj

a sleuth object

test

a character string denoting the test to extract. Possible tests can be found by using models(obj).

test_type

'wt' for Wald test or 'lrt' for Likelihood Ratio test.

which_model

a character string denoting the model. If extracting a wald test, use the model name. Not used if extracting a likelihood ratio test.

rename_cols

if TRUE will rename some columns to be shorter and consistent with the vignette

show_all

if TRUE will show all transcripts (not only the ones passing filters). The transcripts that do not pass filters will have NA values in most columns.

pval_aggregate

if TRUE and both target_mapping and aggregation_column were provided, to sleuth_prep, use lancaster's method to aggregate p-values by the aggregation_column.

...

advanced options for sleuth_results. See details.

Value

If pval_aggregate is FALSE, returns a data.frame with the following columns:

target_id: transcript name, e.g. "ENST#####" (dependent on the transcriptome used in kallisto). If gene_mode is TRUE, this will instead be the IDs specified by the obj$gene_column from obj$target_mapping.
...: if there is a target mapping data frame, all of the annotations columns are added from obj$target_mapping before the other columns.
pval: p-value of the chosen model
qval: false discovery rate adjusted p-value, using Benjamini-Hochberg (see p.adjust)
test_stat (LRT only): Chi-squared test statistic (likelihood ratio test). Only seen with Likelihood Ratio test results.
rss (LRT only): the residual sum of squares under the "null model". Only seen with Likelihood Ratio test results.
degrees_free (LRT only): the degrees of freedom (equal to difference between the two models). Only seen with Likelihood Ratio test results.
b (Wald only): 'beta' value (effect size). Technically a biased estimator of the fold change. Only seen with Wald test results.
se_b (Wald only): standard error of the beta. Only seen with Wald test results.
mean_obs: mean of natural log counts of observations
var_obs: variance of observation
tech_var: technical variance of observation from the bootstraps (named 'sigma_q_sq' if rename_cols is FALSE)
sigma_sq: raw estimator of the variance once the technical variance has been removed
smooth_sigma_sq: smooth regression fit for the shrinkage estimation
final_simga_sq: max(sigma_sq, smooth_sigma_sq); used for covariance estimation of beta (named 'smooth_sigma_sq_pmax' if rename_cols is FALSE)

If pval_aggregate is TRUE, returns a data.frame with the following columns:

target_id: gene ID specified by obj$gene_column, e.g. "ENSG#####" (dependent on the transcriptome used in kallisto).
...: all of the additional annotation columns (not 'target_id' or obj$gene_column) are added from obj$target_mapping before the other columns.
num_aggregated_transcripts: the number of transcripts aggregated for a given gene. These only include filtered transcripts.
sum_mean_obs_counts: this is the sum of the mean observations across all filtered transcripts within a gene. Note that the weighting function is applied before summing.
pval: the aggregated p-value calculated by the lancaster method. See the aggregation package for details.
qval: adjusted p-values using the Benchamini-Hochberg method.

Details

The columns returned by this function will depend on a few factors: whether the test is a Wald test or Likelihood Ratio test, and whether pval_aggregate is TRUE.

The sleuth model is a measurement error in the response model. It attempts to segregate the variation due to the inference procedure by kallisto from the variation due to the covariates -- the biological and technical factors of the experiment (represented by the columns in obj$sample_to_covariates). For the Wald test, the 'b' column represents the estimate of the selected coefficient. In the default setting, it is analogous to, but not equivalent to, the fold-change. The transformed values are on the natural-log scale, and so the the estimated coefficient is also on the natural-log scale. This value is taking into account the estimated 'inferential variance' estimated from the kallisto bootstraps.

If the user wishes to get gene-level results from this function, there are two ways of doing so:

p-value aggregation mode: if pval_aggregate argument is TRUE, this function will aggregate the transcript-level p-values to the gene-level using the lancaster method. See below for advanced options related to this mode. This is the recommended way to do gene-level aggregation. See the paper
count aggregation mode: This is the gene-level aggregation method introduced in sleuth version 0.28.1. This mode is activated if obj$gene_mode is TRUE. In this mode, the modeling and testing was done using aggregated counts (or TPMs), and so the results are same as for the transcript-level results, except the target IDs are now gene IDs instead of transcript IDs.

An important note if pval_aggregate or the old gene_mode is TRUE: when combining the gene annotations from obj$target_mapping, all of the columns except for the transcript ID, obj$target_mapping$target_id, will be included. If there are transcript-level entries for any of the other columns, this will result in duplicate rows in the results table (usually an undesirable result).

Here are advanced options for customizing the p-value aggregation procedure:

weight_func: if pval_aggregate is TRUE, then this is used to weight the p-values for lancaster's method. This function must take the observed means of the transcripts as the only defined argument. The default is identity.

Examples

Run this code

# NOT RUN {
models(sleuth_obj) # for this example, assume the formula is ~condition,
                     and a coefficient is IP
results_table <- sleuth_results(sleuth_obj, 'conditionIP')
# }

Run the code above in your browser using DataLab