Learn R Programming

BANEScarparkinglite (version 0.1.2)

refine: Clean up raw records

Description

Uses functions from the dplyr package to clean up raw records obtained from the Bath: Hacked datastore. The process is as follows:

  • Select columns containing useful information only

  • Remove any records with NA entries

  • Remove records for "test car park"

  • Convert Name and Status to factors

  • Remove records with negative occupancies

  • Calculate Proportion column (Occupancy/Capacity)

  • Remove records with Proportion greater than max_prop

  • Remove duplicate records (see first_upload)

Usage

refine(x, max_prop = 1.1, first_upload = FALSE)

Arguments

x

A data frame containing records to be cleaned up (e.g. the data frame obtained by calling get_all_crude).

max_prop

The point at which records are discarded due to overly-full Occupancy values (default is 1.1, or 110% full, to allow for circulating cars).

first_upload

If TRUE, ensures that when duplicate records are removed, the copy which is kept is the first one uploaded after the record was taken. This takes much longer to run, due to sorting.

Value

A data frame of clean records, with 7 columns:

Name

The name of the car park where the record was taken.

LastUpdate

The time the record was taken (POSIXct date-time object).

DateUploaded

The time the record was uploaded to the Bath: Hacked database (POSIXct date-time object).

Occupancy

The total number of cars in the car park.

Capacity

The number of parking spaces in the car park.

Status

Description of the change in occupancy since the previous record from that car park.

Proportion

Calculated as (Occupancy/Capacity).

Examples

Run this code
# NOT RUN {
raw_data <- get_all_crude()
some_records <- raw_data[1:1000, ]

dim(some_records)
## 1000   16

df <- refine(raw_data)
dim(df)
## 813   7
# }

Run the code above in your browser using DataLab