lmSubsets (version 0.4)

AirPollution: Air Pollution and Mortality

Description

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

Usage

data("AirPollution")

Arguments

Format

A data frame containing 60 observations on 16 variables.

precipitation

Average annual precipitation in inches.

temperature1

Average January temperature in degrees Fahrenheit.

temperature7

Average July temperature in degrees Fahrenheit.

age

Percentage of 1960 SMSA population aged 65 or older.

household

Average household size.

education

Median school years completed by those over 22.

housing

Percentage of housing units which are sound and with all facilities.

population

Population per square mile in urbanized areas, 1960.

noncauc

Percentage of non-Caucasian population in urbanized areas, 1960.

whitecollar

Percentage employed in white collar occupations.

income

Percentage of families with income < USD 3000.

hydrocarbon

Relative hydrocarbon pollution potential.

nox

Relative nitric oxides potential.

so2

Relative sulphur dioxide potential.

humidity

Annual average percentage of relative humidity at 13:00.

mortality

Total age-adjusted mortality rate per 100,000.

References

McDonald GC, Schwing RC (1973). Instabilities of Regression Estimates Relating Air Pollution to Mortality. Technometrics, 15, 463--482.

Miller AJ (2002). Subset Selection in Regression. New York: Chapman and Hall.

Examples

Run this code
# NOT RUN {
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14)  AirPollution[[i]] <- log(AirPollution[[i]])

## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)

## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)
# }

Run the code above in your browser using DataCamp Workspace