fitGAPLM: Fit a Generalized Additive Partially Linear Model on Gene Expression Data

Description

Given an plmDE object containing preprocessed/normalized measures of the expression of a set of genes under different conditions as well as related values of quantitatively-measured covariates of interest, fitGAPLM tests each gene for differential expression under a model specified by the user. The test is conducted based on the significance of a full Model fit to the expression data when compared with the fit of a reduced model (F statistic). The variables of interest should be present in the full model and absent in the reduced. This method is very flexible and can fit count data (eg. expression measures from high-throughput sequencing) as well as microarray data. Using fitGAPLM, the user can choose to model the gene expression measures by any mixture of additive functions of the numerical variables with linear terms of the factorial information available. Each of these functions is approximated through a B-spline fit with the intercept of the spline constrained at zero for identifiability. Although fitGAPLM seems to take in a daunting amount of input, many of the inputs already set to sensible defaults, and models of the complexity represented in this class must be well thought out and each parameter requires careful consideration.

Usage

fitGAPLM(dataObject, generalizedLM = FALSE, family = poisson(link = log), NegativeBinomialUnknownDispersion = FALSE, test = "LRT", weights = NULL, 
offset = NULL, pValueAdjustment = "fdr", significanceLevel = 0.05, 
indicators.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.fullModel = NULL, 
groups.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
groupFunction.fullModel = rep("AdditiveSpline", length(groups.fullModel)), 
fitSplineFromData.fullModel = TRUE, 
splineDegrees.fullModel = rep(3, length(groups.fullModel)), 
splineKnots.fullModel = rep(0, length(groups.reducedModel)), 
compareToReducedModel = FALSE, 
indicators.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.reducedModel = NULL, 
groups.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), groupFunction.reducedModel = rep("AdditiveSpline", 
length(groups.reducedModel)), fitSplineFromData.reducedModel = TRUE, splineDegrees.reducedModel = rep(3, length(groups.reducedModel)), 
splineKnots.reducedModel = rep(0, length(groups.reducedModel)), 
splineKnotSpread = "quantile")

Arguments

dataObject

Object of type plmDE containing the gene expression and sample information.

generalizedLM

If TRUE, a link function is introduced to generalize the linear model. Use for gene-level count data.

family

One of the distribution families that may be used in the function glm. For gene-level count data, the negative binomial (see negative.binomial) is recommended to account for over dispersion.

NegativeBinomialUnknownDispersion

In the case of a negative binomial fit, has the dispersion of the data been estimated or does it remain unknown? If TRUE, then glm.nb from the MASS package is called, which includes routines for fitting the GLM and estimating the dispersion parameter.

test

The test that should be used in the case that a GLM is requested to estimate the significance of the model. See stat.anova for details.

weights

an optional vector of prior weights to be used in the fitting of the (generalized) linear model. Should be NULL or a numeric factor.

offset

an optional a priori known component to be included in the fitting of the (generalized) linear model. One or more offset terms may be included in the model.

pValueAdjustment

Choice of multiple testing correction method to be passed to p.adjust

significanceLevel

The significance level at which genes should be identified as differentially expressed.

indicators.fullModel

The indicator terms which should go into the full model. These must match the groups in the second column of the sample information in dataObject. Under the default setting, the indicators will consist of all groups except for the first one (used as the baseline for comparison).

continuousCovariates.fullModel

The quantitative covariates that should go into the full model. These must match the column names of the sample information in dataObject.

groups.fullModel

The subgroups of our sample for which we wish to estimate a function relating their measurement of continuousCovariates to their expression levels in dataObject.

groupFunction.fullModel

A vector of the same length as groups.fullModel which contains consists of strings matching: "AdditiveSpline", "AdditiveLinear", "CommonSpline", or "CommonLinear". If AdditiveSpline is chosen, then a B-spline basis is fitted to the continuousCovariate values of the corresponding group in groups.fullModel to estimate a function that represents the effect of this group's continuousCovariate values on their measured expression levels. This function implicitly assumes an indicator term so it evaluates to 0 for the measurements of continuousCovariate from other groups, and its overall effects are assumed to be additive with respect to the other parameters being estimated. If "AdditiveLinear" is selected, then this function is taken to be the identity function (no spline basis fit) times a parameter to be fit by the model. To estimate one function to account for the same effect across multiple groups, they must all be listed in groups.fullModel and their corresponding index in goupFunction must be set to "CommonSpline". Likewise to assume a linear effect across multiple groups, they must also be listed in groups.fullModel and the corresponding indices of groupFunction must read "CommonLinear",

fitSplineFromData.fullModel

Should the B-spline functions in the full model be automatically fitted based on the heuristic in fitBspline?

splineDegrees.fullModel

If fitSplineFromData.fullModel has not been selected, then the user may specify, in a vector format, the degree of each B-spline basis that is fitted to the groups.

splineKnots.fullModel

If fitSplineFromData.fullModel has not been selected, then the user may also specify, in a vector, the number of knots to include in each corresponding basis.

compareToReducedModel

If TRUE, then the user must specify a model that the full model should be tested against. Otherwise, the all terms (besides intercept) of the full model are simultaneously tested for significance.

indicators.reducedModel