Learn R Programming

WebAnalytics (version 0.9.13)

logFileRead: Given a list of file names, read them as log files

Description

This function reads a file, parsing it for the fields specified, and normalises the values that have been read.

The log file is assumed to be space delimited, which is the case for Apache and IIS.

Usage

logFileRead(fileName, 
  columnList=c("MSTimestamp", "clientip", "url", "httpcode", "elapsed"), 
  logTimeZone = "", 
  timeFormat = "")

Value

The function returns a dataframe that contains the contents of the file.

Arguments

fileName

The name, including path, of the file to read

columnList

The columns in the file, in order. Columns are:

ApacheTimestampOptionalApache log format timestampMSTimestamp
OptionalIIS log format timestampservernameOptional
Name of the web serverserveripOptionalIP of the server
httpopOptionalHTTP verburl
RequiredPath part of the requestparmsOptional
Query stringportOptionalTCP/IP port that the request arrived on
usernameOptionalUser name logged by the web serveruserip
OptionalIP that the request was seen to originate from.useragentOptional
User agent string in the requesthttpcodeRequiredHTTP response code
windowscodeOptionalWindows return code recorded by IISwindowssubcode
OptionalWindows sub code recorded by IISresponsebytesOptional
Number of bytes in the HTTP responserequestbytesOptionalNumber of bytes in the HTTP request
elapsedmsOptionalRequest elapsed time in millisecondselapsedus
OptionalRequest elapsed time in microseconds (will be rounded to milliseconds)elapsedsOptional
Request elapsed time in seconds (not recommended, will be expanded to milliseconds)jsessionidOptionalUser session identifier

One timestamp and one elapsed time column name must be specified.

The Apache URL is handled partly in the fix data procedure in the config file because it wraps the operation and URL path in one field. The IIS URL does not need this additional parsing.

logTimeZone

The timezone to use to adjust the timestamps in the log. This is used primarily for IIS logs where the log may be either UTC or local time.

timeFormat

If the timestamp in the log is not in the default for IIS or Apache this can be used to override the timestamp parsing. The format is the r strptime format.

Author

Greg Hunt <greg@firmansyah.com>

Examples

Run this code
# \dontshow{
datd = paste0(tempdir(),"/minconfigtemp")
unlink(datd)
dir.create(paste0(tempdir(),"/minconfigtemp"))
logfile = paste0(datd,"/log.log")
fileConn = gzfile(system.file("extdata", "compressed.log", package = "WebAnalytics"))
writeLines(readLines(fileConn,n=100),con=logfile)
close(fileConn)
setDTthreads(threads = 1)
# }

logFileName = logFileNamesGetLast(dataDirectory=datd, 
  directoryNames=c(".", "."), 
  fileNamePattern="*[.]log")[[1]]

cols = logFileFieldsGetIIS(logFileName)

logdf = logFileRead(logFileName, columnList=cols, 
            logTimeZone = "", timeFormat = "")

Run the code above in your browser using DataLab