Uses functions from the dplyr package to clean up raw records obtained
from the Bath: Hacked datastore. The process is as follows:
Select columns containing useful information only
Remove any records with NA entries
Remove records for "test car park"
Convert Name and Status to factors
Remove records with negative occupancies
Calculate Proportion column (Occupancy/Capacity)
Remove records with Proportion greater than max_prop
Remove duplicate records (see first_upload)
Usage
refine(x, max_prop = 1.1, first_upload = FALSE)
Arguments
x
A data frame containing records to be cleaned up (e.g. the data
frame obtained by calling get_all_crude).
max_prop
The point at which records are discarded due to overly-full
Occupancy values (default is 1.1, or 110% full, to allow for circulating
cars).
first_upload
If TRUE, ensures that when duplicate records are
removed, the copy which is kept is the first one uploaded after the record
was taken. This takes much longer to run, due to sorting.
Value
A data frame of clean records, with 7 columns:
Name
The name of the car park where the record was taken.
LastUpdate
The time the record was taken (POSIXct date-time object).
DateUploaded
The time the record was uploaded to the Bath: Hacked
database (POSIXct date-time object).
Occupancy
The total number of cars in the car park.
Capacity
The number of parking spaces in the car park.
Status
Description of the change in occupancy since the previous
record from that car park.