tabulog (version 0.1.1)

parse_logs: Parse Log Files

Description

Parse a log file with a provided template and a set of classes

Usage

parse_logs(text, template, classes = list(), ...)

parse_logs_file(text_file, config_file, formatters = list(), ...)

Arguments

text

Character vector; each element a log record

template

Template string

classes

A named list of parsers or regex strings for use within the template string

...

Other arguments passed onto regexpr for matching regular expressions.

text_file

Filename (or readable connection) containing log text

config_file

Filename (or readable connection) containing template file

formatters

Named list of formatter functions for use of formatting classes

Value

A data.frame with each field identified in the template string as a column. For each record in the passed text, the fields were extracted and formatted using the parser objects in default_classes() and classes.

Details

`template should only be a template string, such as 'ip ip_address [date access_date]...'.

config_file should be a yaml file or connection with the following fields

  • template: Template String

  • classes: Named list of regex strings for building classes

text should be a character vector, with each element representing a a log record

text_file should be a file or connection that can be split (with readLines) into a character vector of records

classes should be a named list of parser objects, where names match names of classes in template string, or a similarly named list of regex strings for coercing into parsers

formatters should be a named list of functions, where names match names of classes in template string, for properly formatting fields once they have been captured

Examples

Run this code
# NOT RUN {
# Template string with two fields
template <- '{{ip ipAddress}} - [{{date accessDate}}] {{int status }}'

# Two simple log records
logs <- c(
  '192.168.1.10 - [26/Jul/2019:11:41:10 -0500] 200',
  '192.168.1.11 - [26/Jul/2019:11:41:21 -0500] 404'
)

# A formatter for the date field
myFormatters <- list(date = function(x) lubridate::as_datetime(x, format = '%d/%b/%Y:%H:%M:%S %z'))
# A parser class for the date field
date_parser <- parser(
  '[0-3][0-9]\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\\+|\\-][0-9]{4}',
  myFormatters$date,
  'date'
)

# Parse the logs from raw data
parse_logs(logs, template, list(date=date_parser))

# Write the logs and to file and parse
logfile <- tempfile()
templatefile <- tempfile()
writeLines(logs, logfile)
yaml::write_yaml(list(template=template, classes=list(date=date_parser)), templatefile)
parse_logs_file(logfile, templatefile, myFormatters)
file.remove(logfile)
file.remove(templatefile)

# }

Run the code above in your browser using DataLab