SparkR (version 2.4.6)

column_datetime_functions: Date time functions for Column operations

Description

Date time functions defined for Column.

Usage

current_date(x = "missing")

current_timestamp(x = "missing")

date_trunc(format, x)

dayofmonth(x)

dayofweek(x)

dayofyear(x)

from_unixtime(x, ...)

hour(x)

last_day(x)

minute(x)

month(x)

quarter(x)

second(x)

to_date(x, format)

to_timestamp(x, format)

unix_timestamp(x, format)

weekofyear(x)

window(x, ...)

year(x)

# S4 method for Column dayofmonth(x)

# S4 method for Column dayofweek(x)

# S4 method for Column dayofyear(x)

# S4 method for Column hour(x)

# S4 method for Column last_day(x)

# S4 method for Column minute(x)

# S4 method for Column month(x)

# S4 method for Column quarter(x)

# S4 method for Column second(x)

# S4 method for Column,missing to_date(x, format)

# S4 method for Column,character to_date(x, format)

# S4 method for Column,missing to_timestamp(x, format)

# S4 method for Column,character to_timestamp(x, format)

# S4 method for Column weekofyear(x)

# S4 method for Column year(x)

# S4 method for Column from_unixtime(x, format = "yyyy-MM-dd HH:mm:ss")

# S4 method for Column window(x, windowDuration, slideDuration = NULL, startTime = NULL)

# S4 method for missing,missing unix_timestamp(x, format)

# S4 method for Column,missing unix_timestamp(x, format)

# S4 method for Column,character unix_timestamp(x, format = "yyyy-MM-dd HH:mm:ss")

# S4 method for Column trunc(x, format)

# S4 method for character,Column date_trunc(format, x)

# S4 method for missing current_date()

# S4 method for missing current_timestamp()

Arguments

x

Column to compute on. In window, it must be a time Column of TimestampType. This is not used with current_date and current_timestamp

format

The format for the given dates or timestamps in Column x. See the format used in the following methods:

  • to_date and to_timestamp: it is the string to use to parse Column x to DateType or TimestampType.

  • trunc: it is the string to use to specify the truncation method. For example, "year", "yyyy", "yy" for truncate by year, or "month", "mon", "mm" for truncate by month.

  • date_trunc: it is similar with trunc's but additionally supports "day", "dd", "second", "minute", "hour", "week" and "quarter".

...

additional argument(s).

windowDuration

a string specifying the width of the window, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid interval strings are 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, '1 day' always means 86,400,000 milliseconds, not a calendar day.

slideDuration

a string specifying the sliding interval of the window. Same format as windowDuration. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. This duration is likewise absolute, and does not vary according to a calendar.

startTime

the offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as "15 minutes".

Details

dayofmonth: Extracts the day of the month as an integer from a given date/timestamp/string.

dayofweek: Extracts the day of the week as an integer from a given date/timestamp/string.

dayofyear: Extracts the day of the year as an integer from a given date/timestamp/string.

hour: Extracts the hour as an integer from a given date/timestamp/string.

last_day: Given a date column, returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.

minute: Extracts the minute as an integer from a given date/timestamp/string.

month: Extracts the month as an integer from a given date/timestamp/string.

quarter: Extracts the quarter as an integer from a given date/timestamp/string.

second: Extracts the second as an integer from a given date/timestamp/string.

to_date: Converts the column into a DateType. You may optionally specify a format according to the rules in: http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html. If the string cannot be parsed according to the specified format (or default), the value of the column will be null. By default, it follows casting rules to a DateType if the format is omitted (equivalent to cast(df$x, "date")).

to_timestamp: Converts the column into a TimestampType. You may optionally specify a format according to the rules in: http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html. If the string cannot be parsed according to the specified format (or default), the value of the column will be null. By default, it follows casting rules to a TimestampType if the format is omitted (equivalent to cast(df$x, "timestamp")).

weekofyear: Extracts the week number as an integer from a given date/timestamp/string.

year: Extracts the year as an integer from a given date/timestamp/string.

from_unixtime: Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the JVM in the given format. See Customizing Formats for available options.

window: Bucketizes rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. It returns an output column of struct called 'window' by default with the nested columns 'start' and 'end'

unix_timestamp: Gets current Unix timestamp in seconds.

trunc: Returns date truncated to the unit specified by the format.

date_trunc: Returns timestamp truncated to the unit specified by the format.

current_date: Returns the current date as a date column.

current_timestamp: Returns the current timestamp as a timestamp column.

See Also

Other data time functions: column_datetime_diff_functions

Examples

Run this code
# NOT RUN {
dts <- c("2005-01-02 18:47:22",
        "2005-12-24 16:30:58",
        "2005-10-28 07:30:05",
        "2005-12-28 07:01:05",
        "2006-01-24 00:01:10")
y <- c(2.0, 2.2, 3.4, 2.5, 1.8)
df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y))
# }
# NOT RUN {
# }
# NOT RUN {
head(select(df, df$time, year(df$time), quarter(df$time), month(df$time),
            dayofmonth(df$time), dayofweek(df$time), dayofyear(df$time), weekofyear(df$time)))
head(agg(groupBy(df, year(df$time)), count(df$y), avg(df$y)))
head(agg(groupBy(df, month(df$time)), avg(df$y)))
# }
# NOT RUN {
# }
# NOT RUN {
head(select(df, hour(df$time), minute(df$time), second(df$time)))
head(agg(groupBy(df, dayofmonth(df$time)), avg(df$y)))
head(agg(groupBy(df, hour(df$time)), avg(df$y)))
head(agg(groupBy(df, minute(df$time)), avg(df$y)))
# }
# NOT RUN {
# }
# NOT RUN {
head(select(df, df$time, last_day(df$time), month(df$time)))
# }
# NOT RUN {
# }
# NOT RUN {
tmp <- createDataFrame(data.frame(time_string = dts))
tmp2 <- mutate(tmp, date1 = to_date(tmp$time_string),
                   date2 = to_date(tmp$time_string, "yyyy-MM-dd"),
                   date3 = date_format(tmp$time_string, "MM/dd/yyy"),
                   time1 = to_timestamp(tmp$time_string),
                   time2 = to_timestamp(tmp$time_string, "yyyy-MM-dd"))
head(tmp2)
# }
# NOT RUN {
# }
# NOT RUN {
tmp <- mutate(df, to_unix = unix_timestamp(df$time),
                  to_unix2 = unix_timestamp(df$time, 'yyyy-MM-dd HH'),
                  from_unix = from_unixtime(unix_timestamp(df$time)),
                  from_unix2 = from_unixtime(unix_timestamp(df$time), 'yyyy-MM-dd HH:mm'))
head(tmp)
# }
# NOT RUN {
# }
# NOT RUN {
# One minute windows every 15 seconds 10 seconds after the minute, e.g. 09:00:10-09:01:10,
# 09:00:25-09:01:25, 09:00:40-09:01:40, ...
window(df$time, "1 minute", "15 seconds", "10 seconds")

# One minute tumbling windows 15 seconds after the minute, e.g. 09:00:15-09:01:15,
# 09:01:15-09:02:15...
window(df$time, "1 minute", startTime = "15 seconds")

# Thirty-second windows every 10 seconds, e.g. 09:00:00-09:00:30, 09:00:10-09:00:40, ...
window(df$time, "30 seconds", "10 seconds")
# }
# NOT RUN {
# }
# NOT RUN {
head(select(df, df$time, trunc(df$time, "year"), trunc(df$time, "yy"),
           trunc(df$time, "month"), trunc(df$time, "mon")))
# }
# NOT RUN {
# }
# NOT RUN {
head(select(df, df$time, date_trunc("hour", df$time), date_trunc("minute", df$time),
            date_trunc("week", df$time), date_trunc("quarter", df$time)))
# }
# NOT RUN {
head(select(df, current_date(), current_timestamp()))
# }

Run the code above in your browser using DataLab