make_ames: Create a Processed Version of the Ames Housing Data

Description

Create a Processed Version of the Ames Housing Data

Usage

make_ames()
make_ames_new()
make_ordinal_ames()

Arguments

Value

A tibble with the data.

Details

For the processed version, the exact details can be found in the code of make_ames but a summary of the differences between these data sets and ames_raw is:

All factors are unordered.
PID and Order are removed.
Spaces and special characters in column names where changed to snake case. To be consistent, SalePrice was changed to Sale_Price.
Many factor levels were changed to be more understandable (e.g. Split_or_Multilevel instead of 080)
Many missing values were reset. For example, if the variable Bsmt_Qual was missing, this implies that there is no basement on the property. Instead of a missing value, the value of Bsmt_Qual was changed to No_Basement. Similarly, numeric data pertaining to basements were set to zero where appropriate such as variables Bsmt_Full_Bath and Total_Bsmt_SF.
Garage_Yr_Blt contained many missing data and was removed.
Approximate longitude and latitude are included for the properties. Also, note that there are 6 properties with identical geotags. These are units within the same building. For some properties, updated versions of the PID identifiers were found and are replaced with new values.

make_ordinal_ames is the same as make_ames but many factor variables were changed to class ordered (see below).

The documentation for ames_raw() contains descriptions of the columns although, as noted above, the column names in ames_raw() are slightly different from the processed versions.

make_ames_new() creates a data set of new properties. These were populated using less data sources than the original and lack a number of the condition and quality. Both properties were unsold at the time of this writing.

Examples

Run this code

# NOT RUN {
ames <- make_ames()
nrow(ames)
summary(ames$Sale_Price)

ames_ord <- make_ordinal_ames()
ord_vars <- vapply(ames_ord, is.ordered, logical(1))
names(ord_vars)[ord_vars]
# }

Run the code above in your browser using DataLab