AirPollution: Air Pollution and Mortality

Description

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

Usage

data("AirPollution")

Arguments

Format

A data frame containing 60 observations on 16 variables.

precipitation: Average annual precipitation in inches.
temperature1: Average January temperature in degrees Fahrenheit.
temperature7: Average July temperature in degrees Fahrenheit.
age: Percentage of 1960 SMSA population aged 65 or older.
household: Average household size.
education: Median school years completed by those over 22.
housing: Percentage of housing units which are sound and with all facilities.
population: Population per square mile in urbanized areas, 1960.
noncauc: Percentage of non-Caucasian population in urbanized areas, 1960.
whitecollar: Percentage employed in white collar occupations.
income: Percentage of families with income < USD 3000.
hydrocarbon: Relative hydrocarbon pollution potential.
nox: Relative nitric oxides potential.
so2: Relative sulphur dioxide potential.
humidity: Annual average percentage of relative humidity at 13:00.
mortality: Total age-adjusted mortality rate per 100,000.

References

McDonald GC, Schwing RC (1973). Instabilities of Regression Estimates Relating Air Pollution to Mortality. Technometrics, 15, 463--482.

Miller AJ (2002). Subset Selection in Regression. New York: Chapman and Hall.

Examples

Run this code

# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14)  AirPollution[[i]] <- log(AirPollution[[i]])

## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)

## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)
# }

Run the code above in your browser using DataLab