# AirPollution

##### Air Pollution and Mortality

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

- Keywords
- datasets

##### Usage

`data("AirPollution")`

##### Format

A data frame containing 60 observations on 16 variables.

- precipitation
Average annual precipitation in inches.

- temperature1
Average January temperature in degrees Fahrenheit.

- temperature7
Average July temperature in degrees Fahrenheit.

- age
Percentage of 1960 SMSA population aged 65 or older.

- household
Average household size.

- education
Median school years completed by those over 22.

- housing
Percentage of housing units which are sound and with all facilities.

- population
Population per square mile in urbanized areas, 1960.

- noncauc
Percentage of non-Caucasian population in urbanized areas, 1960.

- whitecollar
Percentage employed in white collar occupations.

- income
Percentage of families with income < USD 3000.

- hydrocarbon
Relative hydrocarbon pollution potential.

- nox
Relative nitric oxides potential.

- so2
Relative sulphur dioxide potential.

- humidity
Annual average percentage of relative humidity at 13:00.

- mortality
Total age-adjusted mortality rate per 100,000.

##### References

McDonald GC, Schwing RC (1973).
Instabilities of Regression Estimates Relating Air Pollution to Mortality.
*Technometrics*, **15**, 463--482.

Miller AJ (2002).
*Subset Selection in Regression*.
New York: Chapman and Hall.

##### Examples

```
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]])
## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)
## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)
# }
```

*Documentation reproduced from package lmSubsets, version 0.5-1, License: GPL (>= 3)*