read.bin: File processing function for binary files.

Description

A function to process binary accelerometer files and convert the information into R objects.

Usage

read.bin(binfile, outfile = NULL, start = NULL, end = NULL,  verbose = TRUE, do.temp = TRUE,do.volt = TRUE, calibrate = TRUE, downsample = NULL, blocksize , virtual = FALSE, mmap.load = (.Machine$sizeof.pointer >= 8), pagerefs = TRUE, ...)

Arguments

binfile

A filename of a file to process.

outfile

An optional filename specifying where to save the processed data object.

start

Either: A representation of when in the file to begin processing, see Details.

end

Either: A representation of when in the file to end processing, see Details.

verbose

A boolean variable indicating whether some information should be printed during processing should be printed.

do.temp

A boolean variable indicating whether the temperature signal should be extracted.

do.volt

A boolean variable indicating whether the voltage signal should be extracted.

calibrate

A boolean variable indicating whether the raw accelerometer values and the light variable should be calibrated according to the calibration data in the headers.

downsample

A variable indicating the type of downsampling to apply to the data as it is loaded. Can take values: NULL: (Default) No downsampling Single numeric: Reads every downsample-th value, starting from the first. Length two numeric vector: Reads every downsample[1]-th value, starting from the downsample[2]-th.

Non-integer, or non-divisor of 300 downsampling factors are allowed, but will lead to imprecise frequency calculations, leap seconds being introduced, and generally potential problems with other methods. Use with care.

blocksize

Integer value giving maximum number of data pages to read in each pass. Defaults to 10000 for larger data files. Sufficiently small sizes will split very large data files to read chunk by chunk, reducing memory requirements for the read.bin function (without affecting the final object), but conversely possibly increasing processing time. Can be set to Inf for no splitting.

virtual

logical. If set TRUE, do not do any actual data reading. Instead construct a VirtualAccData object containing header information to allow use with get.intervals.

mmap.load

logical. If TRUE (Default on 64bit R), use the mmap package to process the binfile.

pagerefs

A variable that can take two forms, and is considered only for mmap.load = TRUE NULL or FALSE, in which case pagerefs are dynamically calculated for each record. (Default) A vector giving sorted byte offsets for each record for mmap reading of data files. TRUE, in which case a full page reference table is computed before any processing occurs.

Computing pagerefs takes a little time and so is a little slower. However, it is safer than dynamic computations in the case of missing pages and high temperature variations. Further, once page references are calculated, future reads are much faster, so long as the previously computed references are supplied.

...

Any other optional arguments can be supplied that affect manual calibration and data processing. These are:

gain: a vector of 3 values for manual gain calibration of the raw (x,y,z) axes. If gain=NULL, the gain calibration values are taken from within the output file itself.

offset: a vector of 3 value for manual offset calibration of the raw (x,y,z) axes. If offset=NULL, the offset calibration values are taken from within the output file itself.

luxv: a value for manual lux calibration of the light meter. If luxv=NULL, the lux calibration value is taken from within the output file itself.

voltv: a value for manual volts calibration of the light meter. If voltv=NULL, the volts calibration value is taken from within the output file itself.

warn: if set to true, give a warning if input file is large, and require user confirmation.

Value

data.out: A 6 or 7 column matrix of the processed pages, the rows of which are the processed observations in order of processed pages. The matrix has columns (timestamp,x-axis,y-axis,z-axis,light,button) or (timestamp,x-axis,y-axis,z-axis,light,button,temperature) if do.temp=TRUE. The timestamp is stored as seconds since 1 Jan 1970, in the timezone that the data is recorded in.
page.timestamps: The timestamps as POSIXct representations (as opposed to those within the data.out array.)
freq: The effective sampling frequency (in Hz).
filename: The file name of the bin file.
page.numbers: The pages that were loaded.
call: The function call that the object was created with.
page.volts: The battery voltage associated with each loaded page, if do.volt is TRUE.
pagerefs: The page byte offsets that were computed.
header: File header output, as given by header.info.
data.out: A vector containing the timestamps of each page, using local seconds since 1970.
nobs: Number of observations per page, after downsampling.

Warning

Reading in an entire .bin file will take a long time if the file contains a lot of datasets. Reading in such files without downsampling can use up all available memory. See memory.limit. This function is specific to header structure in GENEActiv output files. By design, it should be compatible with all firmware and software versions to date (as of version of current release). If order or field names are changed in future .bin files, this function may have to be updated appropriately.

Details

The read.bin package reads in binary files compatible with the GeneActiv line of Accelerometers, for further processing by the other functions in this package. Most of the default options are those required in the most common cases, though users are advised to consider setting start and end to smaller intervals and/or choosing some level of downsampling when working with data files of longer than 24 hours in length.

The function reads in the desired analysis time window specified by start and end. For convenience, a variety of time window formats are accepted:

Large integers are read as page numbers in the dataset. Page numbers larger than that which is available in the file itself are constrained to what is available. Note that the first page is page 1.

Small values (between 0 and 1) are taken as proportions of the data. For example, `start = 0.5` would specify that reading should begin at the midpoint of the data.

Strings are interpreted as dates and times using parse.time. In particular, times specified as "HH:MM" or "HH:MM:SS" are taken as the earliest time interval containing these times in the file. Strings with an integer prepended, using a space seperator, as interpreted as that time after the appropriate number of midnights have passed - in other words, the appropriate time of day on the Nth *full* day. Days of the week and dates in "day/month", "day/month/year", "month-day", "year-month-day" are also handled. Note that the time is interpreted in the same time zone as the data recording itself.

Actual data reading proceeds by two methods, depending on whether mmap is true or false. With mmap = FALSE, data is read in line by line using readLine until blocksize is filled, and then processed. With mmap = TRUE, the mmap package is used to map the entire data file into an address file, byte locations are calculated (depending on the setting of pagerefs), blocksize chunks of data are loaded, and then processed as raw vectors.

There are advantages and disadvantages to both methods: the mmap method is usually much faster, especially when we are only loading the final parts of the data. ReadLine will have to process the entire file in such a case. On the other hand, mmap requires a large amount of memory address space, and so can fail in 32 bit systems. Finally, reading of compressed bin files can only be done with the readLine method. Generally, if mmap reading fails, the function will attempt to catch the failure, and reprocess the file with the readLine method, giving a warning.

Once data is loaded, calibration is then either performed using values from the binary file, or using manually inputted values (using the gain, offset,luxv and voltv arguments).

Examples

Run this code


binfile  = system.file("binfile/TESTfile.bin", package = "GENEAread")[1]

#Read in the entire file, calibrated
procfile<-read.bin(binfile)
print(procfile)
procfile$data.out[1:5,]

#Uncalibrated, mmap off
procfile2<-read.bin(binfile, calibrate = FALSE)
procfile2$data.out[1:5,]

#Read in again, reusing already computed mmap pagerefs
procfile3<-read.bin(binfile, pagerefs = procfile2$pagerefs )

#Downsample by a factor of 10
procfilelo<-read.bin(binfile, downsample = 10)
print(procfilelo)
object.size(procfilelo) / object.size(procfile)

#Read in a 1 minute interval
procfileshort <- read.bin(binfile, start = "16:50", end = "16:51")
print(procfileshort)

##NOT RUN: Read, and save as a R workspace
#read.bin(binfile, outfile="tmp.Rdata")
#print(load("tmp.Rdata"))
#print(processedfile)

Run the code above in your browser using DataLab