rm_time: Remove/Replace/Extract Time

Description

rm_time - Remove/replace/extract time from a string.

rm_transcript_time - Remove/replace/extract transcript specific time stamps from a string.

as_time - Convert a time stamp removed by rm_time or rm_transcript_time to a standard time format (HH:SS:MM.OS) and optionally convert to as.POSIXlt.

as_time - A convenience function for as_time that unlists and returns a vector rather than a list.

Usage

rm_time(text.var, trim = !extract, clean = TRUE, pattern = "@rm_time",
  replacement = "", extract = FALSE,
  dictionary = getOption("regex.library"), ...)
rm_transcript_time(text.var, trim = !extract, clean = TRUE,
  pattern = "@rm_transcript_time", replacement = "", extract = FALSE,
  dictionary = getOption("regex.library"), ...)
as_time(x, as.POSIXlt = FALSE, millisecond = TRUE)
as_time2(x, ...)
ex_time(text.var, trim = !extract, clean = TRUE, pattern = "@rm_time",
  replacement = "", extract = TRUE,
  dictionary = getOption("regex.library"), ...)
ex_transcript_time(text.var, trim = !extract, clean = TRUE,
  pattern = "@rm_transcript_time", replacement = "", extract = TRUE,
  dictionary = getOption("regex.library"), ...)

Arguments

text.var

The text variable.

trim

logical. If TRUE removes leading and trailing white spaces.

clean

trim logical. If TRUE extra white spaces and escaped character will be removed.

pattern

A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector (see Details for additional information). Default, @rm_time uses the rm_time regex from the regular expression dictionary from the dictionary argument.

replacement

Replacement for matched pattern.

extract

logical. If TRUE the times are extracted into a list of vectors.

dictionary

A dictionary of canned regular expressions to search within if pattern begins with "@rm_".

…

Other arguments passed to gsub.

A list with extracted time stamps.

as.POSIXlt

logical. If TRUE the output will be converted to as.POSIXlt.

millisecond

logical. If TRUE milliseconds are retained. If FALSE they are rounded and added to seconds.

Value

Returns a character string with time removed.

Details

The default regular expression used by rm_time finds time with no AM/PM. This behavior can be altered by using a secondary regular expression from the regex_usa data (or other dictionary) via (pattern = "@rm_time2". See Examples for example usage.

References

The time regular expression was taken from: http://stackoverflow.com/a/25111133/1000343

Examples

Run this code

# NOT RUN {
x <-  c("R uses 1:5 for 1, 2, 3, 4, 5.", 
    "At 3:00 we'll meet up and leave by 4:30:20",
    "We'll meet at 6:33.", "He ran it in :22.34")

rm_time(x)
ex_time(x)

## With AM/PM
x <- c(
    "I'm getting 3:04 AM just fine, but...",
    "for 10:47 AM I'm getting 0:47 AM instead.",
    "no time here",
    "Some time has 12:04 with no AM/PM after it",
    "Some time has 12:04 a.m. or the form 1:22 pm"
)

ex_time(x)
ex_time(x, pat="@rm_time2")
rm_time(x, pat="@rm_time2")
ex_time(x, pat=pastex("@rm_time2", "@rm_time"))

# Convert to standard format
as_time(ex_time(x))
as_time(ex_time(x), as.POSIXlt = TRUE)
as_time(ex_time(x), as.POSIXlt = FALSE, millisecond = FALSE) 

# Transcript specific time stamps
x2 <-c(
    '08:15 8 minutes and 15 seconds	00:08:15.0',
    '3:15 3 minutes and 15 seconds	not 1:03:15.0',
    '01:22:30 1 hour 22 minutes and 30 seconds	01:22:30.0',
    '#00:09:33-5# 9 minutes and 33.5 seconds	00:09:33.5',
    '00:09.33,75 9 minutes and 33.5 seconds	00:09:33.75'
)

rm_transcript_time(x2)
(out <- ex_transcript_time(x2))

as_time(out)
as_time(out, TRUE)
as_time(out, millisecond = FALSE)

# }
# NOT RUN {
if (!require("pacman")) install.packages("pacman")
pacman::p_load(chron)
lapply(as_time(out), chron::times)
lapply(as_time(out, , FALSE), chron::times)
# }

Run the code above in your browser using DataLab