dmDispersion: Estimate dispersions in Dirichlet-multinomial model

Description

Maximum likelihood estimates of dispersion parameters in the Dirichlet-multinomial model used in differential splicing or sQTL analysis.

Usage

dmDispersion(x, ...)
"dmDispersion"(x, mean_expression = TRUE, common_dispersion = TRUE, genewise_dispersion = TRUE, disp_adjust = TRUE, disp_mode = "grid", disp_interval = c(0, 1e+05), disp_tol = 1e-08, disp_init = 100, disp_init_weirMoM = TRUE, disp_grid_length = 21, disp_grid_range = c(-10, 10), disp_moderation = "common", disp_prior_df = 0.1, disp_span = 0.3, prop_mode = "constrOptimG", prop_tol = 1e-12, verbose = 0, BPPARAM = BiocParallel::MulticoreParam(workers = 1))
"dmDispersion"(x, mean_expression = TRUE, common_dispersion = TRUE, genewise_dispersion = TRUE, disp_adjust = TRUE, disp_mode = "grid", disp_interval = c(0, 10000), disp_tol = 1e-08, disp_init = 100, disp_init_weirMoM = TRUE, disp_grid_length = 21, disp_grid_range = c(-10, 10), disp_moderation = "none", disp_prior_df = 0.1, disp_span = 0.3, prop_mode = "constrOptimG", prop_tol = 1e-12, verbose = 0, speed = TRUE, BPPARAM = BiocParallel::MulticoreParam(workers = 1))

Arguments

dmDSdata or dmSQTLdata object.

...

Other parameters that can be defined by methods using this generic.

mean_expression

Logical. Whether to estimate the mean expression of genes.

common_dispersion

Logical. Whether to estimate the common dispersion.

genewise_dispersion

Logical. Whether to estimate the gene-wise dispersion.

disp_adjust

Logical. Whether to use the Cox-Reid adjusted or non-adjusted profile likelihood.

disp_mode

Optimization method used to maximize the profile likelihood. Possible values are "optimize", "optim", "constrOptim", "grid". See Details.

disp_interval

Numeric vector of length 2 defining the interval of possible values for the dispersion.

disp_tol

The desired accuracy when estimating dispersion.

disp_init

Initial dispersion. If common_dispersion is TRUE, then disp_init is overwritten by common dispersion estimate.

disp_init_weirMoM

Logical. Whether to use the Weir moment estimator as an initial value for dispersion. If TRUE, then disp_init is replaced by Weir estimates.

disp_grid_length

Length of the search grid.

disp_grid_range

Vector giving the limits of grid interval.

disp_moderation

Dispersion moderation method. One can choose to shrink the dispersion estimates toward the common dispersion ("common") or toward the (dispersion versus mean expression) trend ("trended")

disp_prior_df

Degree of moderation (shrinkage).

disp_span

Value from 0 to 1 defining the percentage of genes used in smoothing sliding window when calculating the dispersion versus mean expression trend.

prop_mode

Optimization method used to estimate proportions. Possible values "constrOptim" and "constrOptimG".

prop_tol

The desired accuracy when estimating proportions.

verbose

Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting.

BPPARAM

Parallelization method used by bplapply.

speed

Logical. If FALSE, dispersion is calculated per each gene-block. Such calculation may take a long time, since there can be hundreds of SNPs/blocks per gene. If TRUE, there will be only one dipsersion calculated per gene and it will be assigned to all the blocks matched with this gene.

Value

Returns a dmDSdispersion or dmSQTLdispersion object.

Details

Parameters that are used in the dispersion estimation start with prefix disp_, and those that are used for the proportion estimation start with prop_.

There are 4 optimization methods implemented within dmDispersion ("optimize", "optim", "constrOptim" and "grid") that can be used to estimate the gene-wise dispersion. Common dispersion is estimated with "optimize".

Arguments that are used by all the methods are:

disp_adjust
prop_mode: Both "constrOptim" and "constrOptimG" use constrOptim function to maximize the likelihood of Dirichlet-multinomial proportions. The difference lays in the way the likelihood and score are computed. "constrOptim" uses the likelihood and score that are calculated based on the fact that x*Gamma(x) = Gamma(x+1). In "constrOptimG", we compute them using lgamma function. We recommend using the second approach, since it is much faster than the first one.
prop_tol: The accuracy for proportions estimation defined as reltol in constrOptim.

Only some of the rest of dispersion parameters in dmDispersion have an influence on the output for a given disp_mode. Here is a list of such active parameters for different modes:

"optimize", which uses optimize to maximize the profile likelihood.

disp_interval: Passed as interval.
disp_tol: The accuracy defined as tol.

"optim", which uses optim to maximize the profile likelihood.

disp_init and disp_init_weirMoM: The initial value par.
disp_tol: The accuracy defined as factr.

"constrOptim", which uses constrOptim to maximize the profile likelihood.

disp_init and disp_init_weirMoM: The initial value theta..
disp_tol: The accuracy defined as reltol.

"grid", which uses the grid approach from edgeR.

disp_init, disp_grid_length, disp_grid_range: Parameters used to construct the search grid disp_init * 2^seq(from = disp_grid_range[1], to = disp_grid_range[2], length = disp_grid_length).
disp_moderation: Dipsersion shrinkage is available only with "grid" method.
disp_prior_df: Used only when dispersion shrinkage is activated. Moderated likelihood is equal to loglik + disp_prior_df * moderation. Higher disp_prior_df, more shrinkage toward common or trended dispersion is applied.
disp_span: Used only when dispersion moderation toward trend is activated.

Examples

Run this code

###################################
### Differential splicing analysis
###################################
# If possible, use BPPARAM = BiocParallel::MulticoreParam() with more workers

d <- data_dmDSdata

### Filtering
# Check what is the minimal number of replicates per condition 
table(samples(d)$group)
d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3, 
 min_samps_feature_prop = 0)

### Calculate dispersion
d <- dmDispersion(d, BPPARAM = BiocParallel::SerialParam())
plotDispersion(d)

head(mean_expression(d))
common_dispersion(d)
head(genewise_dispersion(d))

Run the code above in your browser using DataLab