podkat (version 1.4.2)

nullModel: Create Null Model for Association Test

Description

Method for creating a null model that can be used for association testing using assocTest

Usage

## S3 method for class 'formula,data.frame':
nullModel(X, y, data,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          n.resampling=0,
          type.resampling=c("bootstrap", "permutation"),
          adj=c("automatic", "none", "force"), adjExact=FALSE,
          n.resampling.adj=10000, checkData=TRUE)
## S3 method for class 'formula,missing':
nullModel(X, y, data,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          n.resampling=0,
          type.resampling=c("bootstrap", "permutation"),
          adj=c("automatic", "none", "force"), adjExact=FALSE,
          n.resampling.adj=10000, checkData=TRUE)
## S3 method for class 'matrix,numeric':
nullModel(X, y,
          type=c("automatic", "logistic", "linear"), ...)
## S3 method for class 'matrix,factor':
nullModel(X, y,
          type=c("automatic", "logistic", "linear"), ...)
## S3 method for class 'missing,numeric':
nullModel(X, y,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          ...)
## S3 method for class 'missing,factor':
nullModel(X, y,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          ...)

Arguments

X
a formula or matrix
y
if the formula interface is used, y can be used to pass a data frame with the table in which both covariates and traits are contained (alternatively, the data argument can be used for that purpose). The other methods (if X is not a formula) expect y to be the trait vector. Trait vectors can either be numeric vectors or a factor with two levels (see details below).
data
for consistency with standard R methods from the stats package, the data frame can also be passed to nullModel via the data argument. In this case, the y must be empty. If y is specified, data is ignored.
type
type of model to train (see details below)
n.resampling
number of null model residuals to sample; set to zero (default) to turn resampling off; resampling is not supported for plain trait vectors without covariates
type.resampling
method how to sample null model residuals; the choice permutation refers to simple random permutations of the model's residuals. If bootstrap is chosen (default), the following strategy is applied for linear models (continuous trait): residuals are sampled as normally distributed values with mean 0 and the same standard deviation as the model's residuals. For logistic models (binary trait), the choice bootstrap selects the same bootstrapping method that is implemented in the SKAT package.
adj
whether or not to use small sample correction for logistic models (binary trait with covariates). The choice none turns off small sample correction. If force is chosen, small sample correction is turned on unconditionally. If automatic is chosen (default), small sample correction is turned on if the number of samples does not exceed 2,000. This argument is ignored for any type of model except logistic and small sample correction is switched off.
adjExact
in case small sample correction is switched on (see above), this argument indicates whether or not the exact square root of the matrix $P_0$ should be pre-computed (see Subsection 9.5 of the package vignette). The default is FALSE. This argument is ignored if small sample correction is not switched on.
n.resampling.adj
number of null model residuals to sample for the adjustment of higher moments; ignored if small sample correction is switched off.
checkData
if FALSE, only a very limited set of input checks is performed. The purpose of this option is to save computational effort for repeated input checks if the function is called from a function that has already performed input checks. The default is TRUE. Only change to FALSE if you know what you are doing!
...
all other parameters are passed on to the nullModel method with signature formula,data.frame.

Value

  • returns a NullModel object

describe

  • Formula interface:the first argument X can be a formula that specifies the trait vector/column, the covariate matrix/columns (if any), and the intercept (if any). If neither the y argument nor the data argument is specified, nullModel searches the environment from which the function has been called. This interface is largely analogous to the functions lm and glm.
  • Trait vector without covariates:if the X argument is omitted and y is a numeric vector or factor, y is interpreted as trait vector, and a null model is created from y without covariates. Linear and logistic models are trained with an intercept. For type bernoulli, the trait vector is written to the output object as is.
  • Trait vector plus covariate matrix:if the X argument is a matrix and y is a numeric vector or factor, y is interpreted as trait vector and X is interpreted as covariate matrix. In this case, linear and logistic models are trained as (generalized) linear regressors that predict the trait from the covariates plus an intercept. The type bernoulli is not available for this variant, since this type of model cannot consider covariates.

code

NullModel

dQuote

  • logistic
  • bernoulli
  • linear
  • logistic

enumerate

  1. The number of samples does not exceed 100.

item

No intercept and no covariates have been specified. This condition can be met by supplying an empty model to the formula interface (e.g. y ~ 0) or by supplying the trait vector as argument y while omitting X.

pkg

  • SKAT
  • podkat

emph

resampling

cite

Lee et al. (2012)

Details

The podkat package assumes a mixed model in which the trait under investigation depends both on covariates (if any) and the genotype. The nullModel method models the relationship between the trait and the covariates (if any) without taking the genotype into account, which corresponds to the null assumption that the trait and the genotype are independent. Therefore, we speak of null models. The following types of models are presently available: [object Object],[object Object],[object Object] The type argument can be used to select the type of model, where the following restrictions apply:
  • For linear models, the trait vector must be numerical. Factors/factor columns are not accepted.
For logistic models and Bernoulli-distributed traits, both numerical vectors and factors are acceptable. In any case, only 0's (controls) and 1's (cases) are accepted. Furthermore, nullModel quits with an error if the trait shows no variation. In other words, trait vectors that only contain 0's or only contain 1's are not accepted (as association testings makes little sense for such traits anyway).

References

http://www.bioinf.jku.at/software/podkat

Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI Exome Sequencing Project - ESP Lung Project Team, Christiani, D. C., Wurfel, M. M., and Lin, X. (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224-237. DOI: http://dx.doi.org/10.1016/j.ajhg.2012.06.007{10.1016/j.ajhg.2012.06.007}.

See Also

NullModel, lm, glm

Examples

Run this code
## read phenotype data from CSV file (continuous trait + covariates)
phenoFile <- system.file("examples/example1lin.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")

## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)

## read phenotype data from CSV file (binary trait + covariates)
phenoFile <- system.file("examples/example1log.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")

## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)

## "train" simple Bernoulli model on a subset of 100 samples
model <- nullModel(y ~ 0, pheno[1:100, ])
model
length(model)
residuals(model)

## alternatively, use the interface that only supplies the
## trait vector
model <- nullModel(y=pheno[1:100, ]$y)
model

Run the code above in your browser using DataLab