A data set containing information on a subset of taxi trips in the city of Chicago in 2022.
data_taxi(...)
tibble
Arguments passed to pins::pin_read()
.
data_taxi()
#> # A tibble: 10,000 x 7
#> tip distance company local dow month hour
#> <fct> <dbl> <fct> <fct> <fct> <fct> <int>
#> 1 yes 17.2 Chicago Independents no Thu Feb 16
#> 2 yes 0.88 City Service yes Thu Mar 8
#> 3 yes 18.1 other no Mon Feb 18
#> 4 yes 20.7 Chicago Independents no Mon Apr 8
#> 5 yes 12.2 Chicago Independents no Sun Mar 21
#> 6 yes 0.94 Sun Taxi yes Sat Apr 23
#> 7 yes 17.5 Flash Cab no Fri Mar 12
#> 8 yes 17.7 other no Sun Jan 6
#> 9 yes 1.85 Taxicab Insurance Agency Llc no Fri Apr 12
#> 10 yes 1.47 City Service no Tue Mar 14
#> # i 9,990 more rows
tibble::glimpse(data_taxi())
#> Rows: 10,000
#> Columns: 7
#> $ tip <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, y~
#> $ distance <dbl> 17.19, 0.88, 18.11, 20.70, 12.23, 0.94, 17.47, 17.67, 1.85, 1~
#> $ company <fct> Chicago Independents, City Service, other, Chicago Independen~
#> $ local <fct> no, yes, no, no, no, yes, no, no, no, no, no, no, no, yes, no~
#> $ dow <fct> Thu, Thu, Mon, Mon, Sun, Sat, Fri, Sun, Fri, Tue, Tue, Sun, W~
#> $ month <fct> Feb, Mar, Feb, Apr, Mar, Apr, Mar, Jan, Apr, Mar, Mar, Apr, A~
#> $ hour <int> 16, 8, 18, 8, 21, 23, 12, 6, 12, 14, 18, 11, 12, 19, 17, 13, ~
The source data are originally described on the linked City of Chicago data portal. The data exported here are a pre-processed subset motivated by the modeling problem of predicting whether a rider will tip or not.
Whether the rider left a tip. A factor with levels "yes" and "no".
The trip distance, in odometer miles.
The taxi company, as a factor. Companies that occurred few times were binned as "other".
Whether the trip started in the same community area as it began. See the source data for community area values.
The day of the week in which the trip began, as a factor.
The month in which the trip began, as a factor.
The hour of the day in which the trip began, as a numeric.
Previous releases of this data (with version = "20230630T214846Z-643d0"
)
included additional columns:
A unique identifier for the trip, as a factor.
The trip duration, in seconds.
The cost of the trip fare, in USD
The cost of tolls for the trip, in USD.
The cost of extra charges for the trip, in USD.
The total cost of the trip, in USD. This is the sum of the previous three columns plus tip.
Type of payment for the trip. A factor with levels "Credit Card", "Dispute", "Mobile", "No Charge", "Prcard", and "Unknown".