Fits the usual weighted or unweighted linear regression model using the
same fitting routines used by `lm`

, but also storing the variance-covariance
matrix `var`

and using traditional dummy-variable coding for categorical
factors.
Also fits unweighted models using penalized least squares, with the same
penalization options as in the `lrm`

function. For penalized estimation,
there is a fitter function call `lm.pfit`

.

```
ols(formula, data, weights, subset, na.action=na.delete,
method="qr", model=FALSE,
x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
penalty=0, penalty.matrix, tol=1e-7, sigma,
var.penalty=c('simple','sandwich'), …)
```

formula

an S formula object, e.g.

Y ~ rcs(x1,5)*lsp(x2,c(10,20))

data

name of an S data frame containing all needed variables. Omit this to use a data frame already in the S ``search list''.

weights

an optional vector of weights to be used in the fitting
process. If specified, weighted least squares is used with
weights `weights`

(that is, minimizing \(sum(w*e^2)\));
otherwise ordinary least squares is used.

subset

an expression defining a subset of the observations to use in the fit. The default
is to use all observations. Specify for example `age>50 & sex="male"`

or
`c(1:100,200:300)`

respectively to use the observations satisfying a logical expression or those having
row numbers in the given vector.

na.action

specifies an S function to handle missing data. The default is the function `na.delete`

,
which causes observations with any variable missing to be deleted. The main difference
between `na.delete`

and the S-supplied function `na.omit`

is that
`na.delete`

makes a list
of the number of observations that are missing on each variable in the model.
The `na.action`

is usally specified by e.g. `options(na.action="na.delete")`

.

method

specifies a particular fitting method, or `"model.frame"`

instead to return the model frame
of the predictor and response variables satisfying any subset or missing value
checks.

model

default is `FALSE`

. Set to `TRUE`

to return the model frame
as element `model`

of the fit object.

x

default is `FALSE`

. Set to `TRUE`

to return the expanded design matrix as element `x`

(without intercept indicators) of the
returned fit object. Set both `x=TRUE`

if you are going to use
the `residuals`

function later to return anything other than ordinary residuals.

y

default is `FALSE`

. Set to `TRUE`

to return the vector of response values
as element `y`

of the fit.

se.fit

default is `FALSE`

. Set to `TRUE`

to compute the estimated standard errors of
the estimate of \(X\beta\) and store them in element `se.fit`

of the fit.

linear.predictors

set to `FALSE`

to cause predicted values not to be stored

penalty

penalty.matrix

see `lrm`

tol

tolerance for information matrix singularity

sigma

If `sigma`

is given, it is taken as the actual root mean squared error parameter for the model. Otherwise `sigma`

is estimated from the data using the usual formulas (except for penalized models). It is often convenient to specify
`sigma=1`

for models with no error, when using `fastbw`

to find an
approximate model that predicts predicted values from the full model with
a given accuracy.

var.penalty

the type of variance-covariance matrix to be stored in the `var`

component of the fit when penalization is used. The default is the
inverse of the penalized information matrix. Specify
`var.penalty="sandwich"`

to use the sandwich estimator (see below
under `var`

), which limited simulation studies have shown yields
variances estimates that are too low.

the same objects returned from `lm`

(unless `penalty`

or `penalty.matrix`

are given - then an
abbreviated list is returned since `lm.pfit`

is used as a fitter)
plus the design attributes
(see `rms`

).
Predicted values are always returned, in the element `linear.predictors`

.
The vectors or matrix stored if `y=TRUE`

or `x=TRUE`

have rows deleted according to `subset`

and
to missing data, and have names or row names that come from the
data frame used as input data. If `penalty`

or `penalty.matrix`

is given,
the `var`

matrix
returned is an improved variance-covariance matrix
for the penalized regression coefficient estimates. If
`var.penalty="sandwich"`

(not the default, as limited simulation
studies have found it provides variance estimates that are too low) it
is defined as
\(\sigma^{2} (X'X + P)^{-1} X'X (X'X + P)^{-1}\), where \(P\) is
`penalty factors * penalty.matrix`

, with a column and row of zeros
added for the
intercept. When `var.penalty="simple"`

(the default), `var`

is
\(\sigma^{2} (X'X + P)^{-1}\).
The returned list has a vector `stats`

with named elements
`n, Model L.R., d.f., R2, g, Sigma`

. `Model L.R.`

is the model
likelihood ratio \(\chi^2\) statistic, and `R2`

is
\(R^2\). For penalized estimation, `d.f.`

is the
effective degrees of freedom, which is the sum of the elements of another
vector returned, `effective.df.diagonal`

, minus one for the
intercept.
`g`

is the \(g\)-index.
`Sigma`

is the penalized maximum likelihood estimate (see below).

For penalized estimation, the penalty factor on the log likelihood is
\(-0.5 \beta' P \beta / \sigma^2\), where \(P\) is defined above.
The penalized maximum likelihood estimate (penalized least squares
or ridge estimate) of \(\beta\) is \((X'X + P)^{-1} X'Y\).
The maximum likelihood estimate of \(\sigma^2\) is \((sse + \beta'
P \beta) / n\), where
`sse`

is the sum of squared errors (residuals).
The `effective.df.diagonal`

vector is the
diagonal of the matrix \(X'X/(sse/n) \sigma^{2} (X'X + P)^{-1}\).

`rms`

, `rms.trans`

, `anova.rms`

,
`summary.rms`

, `predict.rms`

,
`fastbw`

, `validate`

, `calibrate`

,
`Predict`

, `specs.rms`

, `cph`

,
`lrm`

, `which.influence`

, `lm`

,
`summary.lm`

, `print.ols`

,
`residuals.ols`

, `latex.ols`

,
`na.delete`

, `na.detail.response`

,
`datadist`

, `pentrace`

, `vif`

,
`abs.error.pred`

# NOT RUN { set.seed(1) x1 <- runif(200) x2 <- sample(0:3, 200, TRUE) distance <- (x1 + x2/3 + rnorm(200))^2 d <- datadist(x1,x2) options(datadist="d") # No d -> no summary, plot without giving all details f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE) # could use d <- datadist(f); options(datadist="d") at this point, # but predictor summaries would not be stored in the fit object for # use with Predict, summary.rms. In that case, the original # dataset or d would need to be accessed later, or all variable values # would have to be specified to summary, plot anova(f) which.influence(f) summary(f) summary.lm(f) # will only work if penalty and penalty.matrix not used # Fit a complex model and approximate it with a simple one x1 <- runif(200) x2 <- runif(200) x3 <- runif(200) x4 <- runif(200) y <- x1 + x2 + rnorm(200) f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4) pred <- fitted(f) # or predict(f) or f$linear.predictors f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1) # sigma=1 prevents numerical problems resulting from R2=1 fastbw(f2, aics=100000) # This will find the best 1-variable model, best 2-variable model, etc. # in predicting the predicted values from the original model options(datadist=NULL) # }