tabulog v0.1.1

0

Monthly downloads

0th

Percentile

Parsing Semi-Structured Log Files into Tabular Format

Convert semi-structured log files (such as 'Apache' access.log files) into a tabular format (data.frame) using a standard template system.

Readme

An Example in R

Let's again say you have the example logs in the file accesslog.txt.

log_file <- 'vignettes/accesslog.txt'
cat(readr::read_file(log_file))
10.0.0.8 - - [2019-01-01:10:58:12 -0500] "https://mysite.com/index.html"
173.28.102.33 - - [2019-01-01:10:58:25 -0500] "https://mysite.com/login"

We first define the template as before.

template <- '{{ ip ip_address }} - - [{{ date date_time }}] "{{ url URL }}"'

We then need to define our classes. ip and url are builtins with the package, but dates come in a variety of formats so we must explicitly define ours here. Note you can see all builtins using default_classes()

date_parser <- parser(
  '[0-9]{4}\\-[0-9]{2}\\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\\-\\+][0-9]{4}',
  function(x) lubridate::as_datetime(x, format = '%Y-%m-%d:%H:%M:%S %z'),
  name = 'date'
)
date_parser
## Parser: date
## ------------
## Matches:
##   [0-9]{4}\-[0-9]{2}\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\-\+][0-9]{4}
## Formatter:
##   function (x) 
##   lubridate::as_datetime(x, format = "%Y-%m-%d:%H:%M:%S %z")
default_classes()[c('ip', 'url')]
## $ip
## Parser: ip
## ----------
## Matches:
##   [0-9]{1,3}(\.[0-9]{1,3}){3}
## Formatter:
##   .Primitive("(")
## 
## $url
## Parser: url
## -----------
## Matches:
##   (-|(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+)
## Formatter:
##   .Primitive("(")

Both ip and url require no formatting, so they have the identity function, (( in R), as their formatter.

To get our final output in tabular format, we simply make the follow call to parse_logs.

# Naming the date_parser 'date' in the list tells Tabulog to use it to parse
# the field with class 'date' in the template.
parse_logs(readLines(log_file), template, classes = list(date = date_parser))
##      ip_address           date_time                           URL
## 1      10.0.0.8 2019-01-01 15:58:12 https://mysite.com/index.html
## 2 173.28.102.33 2019-01-01 15:58:25      https://mysite.com/login

Note that we only had to pass our custom class date. The builtin classes ip and url were included by default.

A more elegant and portable way of completing this task would be to define the template and the custom class in the same file, which can be ported to other Tabulog libraries in other languages, leaving only the formatters to be defined in the R script.

First, we define the template and the classes in a yaml file

template_file <- 'vignettes/accesslog_template.yml'
cat(readr::read_file(template_file))
template: '{{ ip ip_address }} - - [{{ date date_time }}] "{{ url URL }}"'
classes:
  date: '[0-9]{4}\-[0-9]{2}\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\-\+][0-9]{4}'

Next, we define the formatters for each of our classes. Here we only have one, but we still put it in a named list, with the name matching the name of the class in the template file.

formatters <- list(
  date = function(x) lubridate::as_datetime(x, format = '%Y-%m-%d:%H:%M:%S %z')
)

Finally, we make one call to parse_logs_file.

parse_logs_file(log_file, template_file, formatters)
##      ip_address           date_time                           URL
## 1      10.0.0.8 2019-01-01 15:58:12 https://mysite.com/index.html
## 2 173.28.102.33 2019-01-01 15:58:25      https://mysite.com/login

Functions in tabulog

Name Description
default_classes Default parser classes
parse_logs Parse Log Files
format.parser Encode for printing
parser Parser Objects
print.parser Print
formatter Formatters
name Parser Names
No Results!

Vignettes of tabulog

Name
accesslog.txt
accesslog_template.yml
tabulog.Rmd
No Results!

Last month downloads

Details

Type Package
License MIT + file LICENSE
Encoding UTF-8
LazyData true
VignetteBuilder knitr
RoxygenNote 6.1.1
NeedsCompilation no
Packaged 2019-08-08 23:26:40 UTC; nar
Repository CRAN
Date/Publication 2019-08-09 13:00:02 UTC
suggests knitr , lubridate , readr
imports yaml
Contributors Austin Nar

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/tabulog)](http://www.rdocumentation.org/packages/tabulog)