parse_date_time
parses an input vector into POSIXct date-time
object. It differs from strptime
in two respects. First,
it allows specification of the order in which the formats occur without the
need to include separators and "%" prefix. Such a formating argument is
refered to as "order". Second, it allows the user to specify several
format-orders to handle heterogeneous date-time character
representations.
parse_date_time2
is a fast C parser of numeric
orders.
fast_strptime
is a fast C parser of numeric formats only
that accepts explicit format arguments, just as
strptime
.parse_date_time(x, orders, tz = "UTC", truncated = 0, quiet = FALSE,
locale = Sys.getlocale("LC_TIME"), select_formats = .select_formats,
exact = FALSE)parse_date_time2(x, orders, tz = "UTC", exact = FALSE)
fast_strptime(x, format, tz = "UTC")
strptime
but
might not include the "%" prefix, for example "ymd" will match all the
possible datetruncated
parameter is
non-zero parse_date_time
alsostrptime
and
as.POSIXct
. Default is Fsystem("locale -a")
to list all the installed locales.x
. it receives a
named integer vector and returns a character vector of selected
formats. Names of the input vector are formats (not ordeTRUE
, orders
parameter is interpreted
as an exact strptime
format and no trainign or guessing are
performed.strptime
.parse_date_time
sorts the
supplied format-orders based on a training set and then applies them
recursively on the input vector.parse_date_time
, and hence all the derived functions, such as
ymd_hms
, ymd
etc, will drop into fast_strptime
instead
of strptime
whenever the guesed from the input data formats are all
numeric.
The list below contains formats recognized by lubridate. For numeric formats
leading 0s are optional. In contrast to strptime
, some of the formats
have been extended for efficiency reasons. They are marked with "*". Fast
perasers, parse_date_time2
and fast_strptime
, currently
accept only formats marked with "!".
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
strptime
, ymd
, ymd_hms
## ** orders are much easier to write **
x <- c("09-01-01", "09-01-02", "09-01-03")
parse_date_time(x, "ymd")
parse_date_time(x, "y m d")
parse_date_time(x, "%y%m%d")
# "2009-01-01 UTC" "2009-01-02 UTC" "2009-01-03 UTC"
## ** heterogenuous date-times **
x <- c("09-01-01", "090102", "09-01 03", "09-01-03 12:02")
parse_date_time(x, c("ymd", "ymd HM"))
## ** different ymd orders **
x <- c("2009-01-01", "02022010", "02-02-2010")
parse_date_time(x, c("dmY", "ymd"))
## "2009-01-01 UTC" "2010-02-02 UTC" "2010-02-02 UTC"
## ** truncated time-dates **
x <- c("2011-12-31 12:59:59", "2010-01-01 12:11", "2010-01-01 12", "2010-01-01")
parse_date_time(x, "Ymd HMS", truncated = 3)
parse_date_time(x, "ymd_hms", truncated = 3)
## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC"
## [3] "2010-01-01 12:00:00 UTC" "2010-01-01 00:00:00 UTC"
## ** specifying exact formats and avoiding training and guessing **
parse_date_time(x, c("%m-%d-%y", "%m%d%y", "%m-%d-%y %H:%M"), exact = TRUE)
## [1] "2001-09-01 00:00:00 UTC" "2002-09-01 00:00:00 UTC" NA "2003-09-01 12:02:00 UT
parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'),
c('%m/%d/%Y %I:%M:%S','%m/%d/%Y %H%M'), exact = TRUE)
## [1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC"
## ** fast parsing **
options(digits.secs = 3)
## random times between 1400 and 3000
tt <- as.character(.POSIXct(runif(1000, -17987443200, 32503680000)))
tt <- rep.int(tt, 1000)
system.time(out <- as.POSIXct(tt, tz = "UTC"))
system.time(out1 <- ymd_hms(tt)) # constant overhead on long vectors
system.time(out2 <- parse_date_time2(tt, "YmdHMOS"))
system.time(out3 <- fast_strptime(tt, "%Y-%m-%d %H:%M:%OS"))
all.equal(out, out1)
all.equal(out, out2)
all.equal(out, out3)
## ** how to use `select_formats` argument **
## By default \%Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to \%y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"
Run the code above in your browser using DataLab