alm: Advanced Linear Model

Description

Function estimates model based on the selected distribution

Usage

alm(formula, data, subset, na.action, distribution = c("dnorm", "dlogis",
  "dlaplace", "dalaplace", "ds", "dt", "dfnorm", "dlnorm", "dchisq",
  "dpois", "dnbinom", "dbeta", "plogis", "pnorm"), occurrence = c("none",
  "plogis", "pnorm"), B = NULL, vcovProduce = FALSE, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

a data frame or a matrix, containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The factory-fresh default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

distribution

what density function to use in the process. The full name of the distribution should be provided here. Values with "d" in the beginning of the name refer to the density function, while "p" stands for "probability" (cumulative distribution function). The names align with the names of distribution functions in R. For example, see dnorm.

occurrence

what distribution to use for occurrence variable. Can be "none", then nothing happens; "plogis" - then the logistic regression using alm() is estimated for the occurrence part; "pnorm" - then probit is constructed via alm() for the occurrence part. In both of the latter cases, the formula used is the same as the formula for the sizes. Finally, an "alm" model can be provided and its estimates will be used in the model construction.

If this is not "none", then the model is estimated in two steps: 1. Occurrence part of the model; 2. Sizes part of the model (excluding zeroes from the data).

vector of parameters of the linear model. When NULL, it is estimated.

vcovProduce

whether to produce variance-covariance matrix of coefficients or not. This is done via hessian calculation, so might be computationally costly.

...

additional parameters to pass to distribution functions (e.g. alpha value for Asymmetric Laplace distribution).

Value

Function returns model - the final model of the class "alm", which contains:

coefficients - estimated parameters of the model,
vcov - covariance matrix of parameters of the model (based on Fisher Information). Returned only when vcovProduce=TRUE.
actuals - actual values of the response variable,
fitted.values - fitted values,
residuals - residuals of the model,
mu - the estimated location parameter of the distribution,
scale - the estimated scale parameter of the distribution,
distribution - distribution used in the estimation,
logLik - log-likelihood of the model,
df.residual - number of degrees of freedom of the residuals of the model,
df - number of degrees of freedom of the model,
call - how the model was called,
rank - rank of the model,
data - data used for the model construction,
occurrence - the occurrence model used in the estimation,
other - the list of all the other parameters either passed to the function or estimated in the process, but not included in the standard output (e.g. alpha for Asymmetric Laplace).

Details

This is a function, similar to lm, but for the cases of several non-normal distributions. These include:

Normal distribution, dnorm,
Logistic Distribution, dlogis,
Laplace distribution, dlaplace,
Asymmetric Laplace distribution, dalaplace,
T-distribution, dt,
S-distribution, ds,
Folded normal distribution, dfnorm,
Log normal distribution, dlnorm,
Chi-Squared Distribution, dchisq,
Beta distribution, dbeta,
Poisson Distribution, dpois,
Negative Binomial Distribution, dnbinom,
Cumulative Logistic Distribution, plogis,
Cumulative Normal distribution, pnorm.

This function is slower than lm, because it relies on likelihood estimation of parameters, hessian calculation and matrix multiplication. So think twice when using distribution="dnorm" here.

Probably some other distributions will be added to this function at some point...

The estimation is done using likelihood of respective distributions.

Examples

Run this code

# NOT RUN {
xreg <- cbind(rlaplace(100,10,3),rnorm(100,50,5))
xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rlaplace(100,0,3),xreg,rnorm(100,300,10))
colnames(xreg) <- c("y","x1","x2","Noise")
inSample <- xreg[1:80,]
outSample <- xreg[-c(1:80),]

ourModel <- alm(y~x1+x2, inSample, distribution="dlaplace")
summary(ourModel)
plot(predict(ourModel,outSample))

# An example with binary response variable
xreg[,1] <- round(exp(xreg[,1]-70) / (1 + exp(xreg[,1]-70)),0)
colnames(xreg) <- c("y","x1","x2","Noise")
inSample <- xreg[1:80,]
outSample <- xreg[-c(1:80),]

# Logistic distribution (logit regression)
ourModel <- alm(y~x1+x2, inSample, distribution="plogis")
summary(ourModel)
plot(predict(ourModel,outSample,interval="c"))

# Normal distribution (probit regression)
ourModel <- alm(y~x1+x2, inSample, distribution="pnorm")
summary(ourModel)
plot(predict(ourModel,outSample,interval="p"))

# }