# ols

##### Linear Model Estimation Using Ordinary Least Squares

Fits the usual weighted or unweighted linear regression model using the
same fitting routines used by `lm`

, but also storing the variance-covariance
matrix `var`

and using traditional dummy-variable coding for categorical
factors.
Also fits unweighted models using penalized least squares, with the same
penalization options as in the `lrm`

function. For penalized estimation,
there is a fitter function call `lm.pfit`

.

- Keywords
- models, regression

##### Usage

```
ols(formula, data, weights, subset, na.action=na.delete,
method="qr", model=FALSE,
x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
penalty=0, penalty.matrix, tol=1e-7, sigma,
var.penalty=c('simple','sandwich'), ...)
```

##### Arguments

- formula
- an S formula object, e.g. Y ~ rcs(x1,5)*lsp(x2,c(10,20))
- data
- name of an S data frame containing all needed variables. Omit this to use a data frame already in the S ``search list''.
- weights
- an optional vector of weights to be used in the fitting
process. If specified, weighted least squares is used with
weights
`weights`

(that is, minimizing $sum(w*e^2)$); otherwise ordinary least squares is used. - subset
- an expression defining a subset of the observations to use in the fit. The default
is to use all observations. Specify for example
`age>50 & sex="male"`

or`c(1:100,200:300)`

respectively to use the observations satisfying a logica - na.action
- specifies an S function to handle missing data. The default is the function
`na.delete`

, which causes observations with any variable missing to be deleted. The main difference between`na.delete`

and the S-supplied function`n`

- method
- specifies a particular fitting method, or
`"model.frame"`

instead to return the model frame of the predictor and response variables satisfying any subset or missing value checks. - model
- default is
`FALSE`

. Set to`TRUE`

to return the model frame as element`model`

of the fit object. - x
- default is
`FALSE`

. Set to`TRUE`

to return the expanded design matrix as element`x`

(without intercept indicators) of the returned fit object. Set both`x=TRUE`

if you are going to use the`residuals`

- y
- default is
`FALSE`

. Set to`TRUE`

to return the vector of response values as element`y`

of the fit. - se.fit
- default is
`FALSE`

. Set to`TRUE`

to compute the estimated standard errors of the estimate of $X\beta$ and store them in element`se.fit`

of the fit. - linear.predictors
- set to
`FALSE`

to cause predicted values not to be stored - penalty
- penalty.matrix
- see
`lrm`

- tol
- tolerance for information matrix singularity
- sigma
- If
`sigma`

is given, it is taken as the actual root mean squared error parameter for the model. Otherwise`sigma`

is estimated from the data using the usual formulas (except for penalized models). It is often convenient to specify - var.penalty
- the type of variance-covariance matrix to be stored in the
`var`

component of the fit when penalization is used. The default is the inverse of the penalized information matrix. Specify`var.penalty="sandwich"`

to use the sandwich e - ...
- arguments to pass to
`lm.wfit`

or`lm.fit`

##### Details

For penalized estimation, the penalty factor on the log likelihood is
$-0.5 \beta' P \beta / \sigma^2$, where $P$ is defined above.
The penalized maximum likelihood estimate (penalized least squares
or ridge estimate) of $\beta$ is $(X'X + P)^{-1} X'Y$.
The maximum likelihood estimate of $\sigma^2$ is $(sse + \beta'
P \beta) / n$, where
`sse`

is the sum of squared errors (residuals).
The `effective.df.diagonal`

vector is the
diagonal of the matrix $X'X/(sse/n) \sigma^{2} (X'X + P)^{-1}$.

##### Value

- the same objects returned from
`lm`

(unless`penalty`

or`penalty.matrix`

are given - then an abbreviated list is returned since`lm.pfit`

is used as a fitter) plus the design attributes (see`rms`

). Predicted values are always returned, in the element`linear.predictors`

. The vectors or matrix stored if`y=TRUE`

or`x=TRUE`

have rows deleted according to`subset`

and to missing data, and have names or row names that come from the data frame used as input data. If`penalty`

or`penalty.matrix`

is given, the`var`

matrix returned is an improved variance-covariance matrix for the penalized regression coefficient estimates. If`var.penalty="sandwich"`

(not the default, as limited simulation studies have found it provides variance estimates that are too low) it is defined as $\sigma^{2} (X'X + P)^{-1} X'X (X'X + P)^{-1}$, where $P$ is`penalty factors * penalty.matrix`

, with a column and row of zeros added for the intercept. When`var.penalty="simple"`

(the default),`var`

is $\sigma^{2} (X'X + P)^{-1}$. The returned list has a vector`stats`

with named elements`n, Model L.R., d.f., R2, Sigma`

.`Model L.R.`

is the model likelihood ratio $\chi^2$ statistic, and`R2`

is $R^2$. For penalized estimation,`d.f.`

is the effective degrees of freedom, which is the sum of the elements of another vector returned,`effective.df.diagonal`

, minus one for the intercept.`Sigma`

is the penalized maximum likelihood estimate (see below).

##### See Also

`rms`

, `rms.trans`

, `anova.rms`

,
`summary.rms`

, `predict.rms`

,
`fastbw`

, `validate`

, `calibrate`

,
`Predict`

, `specs.rms`

, `cph`

,
`lrm`

, `which.influence`

, `lm`

,
`summary.lm`

, `print.ols`

,
`residuals.ols`

, `latex.ols`

,
`na.delete`

, `na.detail.response`

,
`datadist`

, `pentrace`

, `vif`

,
`abs.error.pred`

##### Examples

```
set.seed(1)
x1 <- runif(200)
x2 <- sample(0:3, 200, TRUE)
distance <- (x1 + x2/3 + rnorm(200))^2
d <- datadist(x1,x2)
options(datadist="d") # No d -> no summary, plot without giving all details
f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
# could use d <- datadist(f); options(datadist="d") at this point,
# but predictor summaries would not be stored in the fit object for
# use with plot.Design, summary.Design. In that case, the original
# dataset or d would need to be accessed later, or all variable values
# would have to be specified to summary, plot
anova(f)
which.influence(f)
summary(f)
summary.lm(f) # will only work if penalty and penalty.matrix not used
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
pred <- fitted(f) # or predict(f) or f$linear.predictors
f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1)
# sigma=1 prevents numerical problems resulting from R2=1
fastbw(f2, aics=100000)
# This will find the best 1-variable model, best 2-variable model, etc.
# in predicting the predicted values from the original model
options(datadist=NULL)
```

*Documentation reproduced from package rms, version 2.0-2, License: GPL (>= 2)*