LPS.coeff: Linear Predictor Score coefficient computation

Description

As Linear Predictor Score coefficients are genuinely t statistics, this function provides a faster implementation for large datasets than using t.test.

Usage

LPS.coeff(data, response, formula = ~1, type = c("t", "limma"),
    p.value = TRUE, log = FALSE, weighted = FALSE, ...)

Arguments

data

Continuous data used to retrieve classes, as a data.frame or matrix, with samples in rows and features (genes) in columns. Rows and columns should be named. NA values are silently ignored. Some precautions mu

response

Already known classes for the samples provided in data, preferably as a two-level factor. Can be missing if a formula with a response element is provided, but this argument precedes.

formula

A formula object, describing the features to consider in data. The formula response element (before the "~" sign) can replace the response argument if it is not provided. The features can be enumerated in the variabl

type

Single character value, "t" to compute genuine t statistics (unequal variances and unpaired samples) or "limma" to use the lmFit() and eBayes() t statistics from this microarray oriented Bioconductor package.

p.value

Single logical value, whether to compute (two-sided) p-values or not.

log

Single logical value, whether to log-transform t or not (sign will be preserved). Original description of the LPS does not include log-transformation, but it may be useful to not over-weight discriminant genes in large series. Values between -1 and 1 are

weighted

Single logical value, whether to divide t (or log-transformed t) by gene mean or not. We recommend to normalize data only by samples and use weighted = TRUE to include gene centering in the model, rather than centering and scaling genes by no

...

Further arguments are passed to model.frame if response is missing (thus defined via formula). subset and na.action may be particularly useful for cro

Value

Always returns a row named numeric matrix, with a "t" column holding statistics computed. If p.value is TRUE, a second "p.value" column is added.

References

http://www.bioconductor.org/packages/release/bioc/html/limma.html

Examples

Run this code

# Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  
  # All features, all samples
  k <- LPS.coeff(data=expr, response=group)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr))
  ### LPS.coeff(formula=group~., data=as.data.frame(expr), na.action=na.pass)
  ### The last is correct but (really) slow on large datasets
  
  # Feature subset, all samples
  k <- LPS.coeff(data=expr[, c("27481","17013") ], response=group)
  k <- LPS.coeff(formula=group~`27481`+`17013`, data=as.data.frame(expr))
  ### Notice backticks in formula for syntactically invalid names
  
  # All features, sample subset
  training <- rosenwald.cli$set == "Training"
  ### training <- sample.int(nrow(expr), 10)
  ### training <- which(rosenwald.cli$set == "Training")
  ### training <- rownames(subset(rosenwald.cli, set == "Training"))
  k <- LPS.coeff(data=expr, response=group, subset=training)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), subset=training)

  # NA handling by model.frame()
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), na.action=na.omit)