Rfast (version 1.7.3)

Many univariate generalised linear models: Many univariate generalised linear regressions

Description

It performs very many univariate generalised linear regressions.

Usage

univglms(y, x, oiko = NULL, logged = FALSE)

Arguments

y
The dependent variable. It can be a factor or a numerical variable with two values only (binary logistic regression), a discrete valued vector (count data) corresponding to a poisson regression or a numerical vector with continuous values (normal regression). If it contains percentages or proportions (values between 0 and 1) they are transformed in $R$ using $log{y}/log{(1-y)}$ and linear regression is applied.
x
A matrix with the data, where the rows denote the samples (and the two groups) and the columns are the variables. Currently only continuous variables are allowed. You are advised to standardise the data before hand to avoid numerical overflow or similar issues. If you see NaN in the outcome, this is the case.
oiko
This can be either "normal", "poisson" or "binomial". If you are not sure leave it NULL and the function will check internally. However, you might have discrete data (e.g. years of age) and want to perform many simple linear regressions. In this case you should specify the family.
logged
A boolean variable; it will return the logarithm of the pvalue if set to TRUE.

Value

A matrix with the test statistic and the p-value for each predictor variable.

Details

If you specify no family of distributions the function internally checkes the type of your data and decides on the type of regression to perform. The function is written in C++ and this is why it is very fast. It can accept thousands of predictor variables. It is usefull for univariate screening. We provide no p-value correction (such as fdr or q-values); this is up to the user.

References

Draper, N.R. and Smith H. (1988). Applied regression analysis. New York, Wiley, 3rd edition.

McCullagh, Peter, and John A. Nelder. Generalized linear models. CRC press, USA, 2nd edition, 1989.

See Also

logistic_only, poisson_only, allbetas, correls, regression

Examples

Run this code
## 500 variables, hence 200 univariate regressions are to be fitted
x = matrix( rnorm(100 * 200), ncol = 200 )

## 100 observations in total
y = rbinom(100, 1, 0.6)   ## binary logistic regression
system.time( univglms(y, x) )

a1 = univglms(y, x) 

a2 <- numeric(200)
system.time( for (i in 1:200) a2[i] = glm(y ~ x[, i], binomial)$deviance )

a2 = glm(y ~ 1, binomial)$null.dev - a2

### poisson regression
y = rpois(100, 10)
system.time(  univglms(y, x) )

b1 = univglms(y, x) 
b2 <- numeric(200)
system.time( for (i in 1:200) b2[i] = glm(y ~ x[, i], poisson)$deviance )

b2 = glm(y ~ 1, poisson)$null.dev - b2

Run the code above in your browser using DataCamp Workspace