read.ctd: Read a CTD data file

Description

Read a CTD data file, producing an object of type ctd.

Usage

read.ctd(file, type=NULL, columns=NULL, station=NULL, 
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)
read.ctd.sbe(file, columns=NULL, station=NULL, missing.value,
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)
read.ctd.woce(file, columns=NULL, station=NULL, missing.value=-999, 
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)
read.ctd.woce.other(file, columns=NULL, station=NULL, missing.value=-999, 
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)
read.ctd.odf(file, columns=NULL, station=NULL, missing.value=-999, 
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)
read.ctd.itp(file, columns=NULL, station=NULL, missing.value=-999, 
  monitor=FALSE, debug=getOption("oceDebug"), processingLog, ...)

Arguments

file

a connection or a character string giving the name of the file to load. For read.ctd.sbe() and read.ctd.woce(), this may be a wildcard (e.g. "*.cnv" or "*.csv") in which case the retu

type

if NULL, then the first line is studied, in order to determine the file type. If type="SBE19", then a Seabird 19 (or similar) CTD format is assumed. If type="WOCE" then a WOCE-exchange

debug

a flag that turns on debugging. Set to 1 to get a moderate amount of debugging information, or to 2 to get more.

columns

if NULL, then read.ctd tries to infer column names from the header. For SBE files only, the column argument can control the column selection. It is a list that names data types and the columns con

station

optional character string containing an identifying name (or number) for the station. (This can be useful if the routine cannot determine the name automatically, or if another name is preferred.)

missing.value

optional missing-value flag; data matching this value will be set to NA upon reading.

monitor

boolean, set to TRUE to provide an indication of progress. This is useful if filename is a wildcard.

processingLog

if provided, the action item to be stored in the log. (Typically only provided for internal calls; the default that it provides is better for normal calls by a user.)

...

additional arguments, passed to called routines.

Value

An object of class "ctd", which is a list with elements detailed below. The most important elements are the station name and position, along with the profile data that are contained in the data frame named data. (Other elements in the list may be deleted in future versions of the package, if they prove to be of little use in practice, or if they prove to have been idiosyncratic features of the particular files used in early development of oce.)
dataa data table containing the profile data. The column names are discovered from the header, and may differ from file to file. For example, some CTD instruments may have a fluorometer connected, others may not. The order of the columns may vary from case to case, and so it is important to refer to them by name. The following vectors are normally present: data$pressure, data$salinity, data$temperature, and data$sigmatheta. ($\sigma_\theta$ is calculated using swSigmaTheta.)
metadataa list containing the following items [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
processingLoga processingLog of processing, in the standard oce format.

Implementation and extension

The functions attempt to infer a wide range of meta-information from file headers, but variations in these headers limit general application. For example, read.ctd.sbe handles water depths in any of the following forms, but ostensibly similar forms may not work.

"** DEPTH = 100"
"** Water Depth: 40 m"
"** Depth (m): 3447 "
"** Depth: 16"
"** Profondeur: 92"

If water depth cannot be inferred from the header, read.ctd sets it to the maximum recorded pressure, and issues a warning to that effect.

Similar issues come up for essentially everything stored in CTD headers, and so if odd values are found (e.g. a station in the wrong hemisphere), there is a good chance that the format is not being handled correctly. Given the expense of collecting data, users are well-advised to check inferred values against the values in the data files, for at least on profile within a given cruise. Modifying the read.ctd code is not particularly difficult, and users are encouraged to examine the source code (in R/ctd.R) to see whether modification can help. Some experience with regular expressions and string manipulation may be needed; see regexpr and sub. Three sample files are provided with the package, in

system.file("extdata", "ctd.cnv", package="oce")
system.file("extdata", "d200321-001.ctd", package="oce")
system.file("extdata", "CTD_BCD2010666_01_01_DN.ODF", package="oce")

and an examination of these in relationship with the existing code should help users to understand the present implementation, providing insights on extending it for their own data.

In many cases, CTD instruments are set up to report dates in English. This can cause a problem for users running in different locales, since e.g month names differ. Therefore, if you know your datafile is written in American-English notation, you might want to do Sys.setlocale("LC_TIME", "en_US") before you try to read the data.

Details

These functions read CTD datasets that have been stored in common formats, and could be extended to accommodate other formats if needed. The basic function is read.ctd, which analyzes some of the file contents, and then calls one of the following, any of which can be called directly.

read.ctd.sbe()reads files files created by Seabird CTD instruments. These are recognized by a first line with first ten characters ``* Sea-Bird.''
read.ctd.woce()reads files stored in the exchange format used by the World Ocean Circulation Experiment (WOCE) (first 4 characters of the first line being ``CTD,''), and also in a rarer format with the first 3 characters being ``CTD'' followed by a blank or the end of the line).
read.ctd.woce.other()reads the format called ``CTD'' in the section of the archive websites named ``Other formats.'' These data are stored in filenames ending.WCT, and they do not have a great deal of metadata (e.g. longitude), so the user is forced to infer such things from a separate file. Support for this data type is limited, e.g. requiring a header of a certain length and data columns in a certain order. Improvements are unlikely to be added to the function, since this data type seems to offer no advantages over the type handled byread.ctd.woce().
read.ctd.odf()reads files stored in Ocean Data Format, used in some Canadian hydrographic databases.

Different file types provide different meta-information. For example, the WOCE exchange format binds together the institute name and the initials of the chief scientist into a single string that read.ctd cannot parse, so both object@metadata$institute and object@metadata$scientist are left blank for WOCE files.

References

The Sea-Bird SBE 19plus profiler is described at http://www.seabird.com/products/spec_sheets/19plusdata.htm. The company recommends the use of their own software, and perhaps for this reason it is difficult to find a specification for the data files. Inspection of data files led to most of the code used in Oce. If the company ever publishes standards for the data formats, of course Oce will be adjusted. In the meantime, it does a reasonable job in many instances.

The WOCE-exchange format is described at http://woce.nodc.noaa.gov/woce_v3/wocedata_1/whp/exchange/exchange_format_desc.htm, and a sample file is at http://woce.nodc.noaa.gov/woce_v3/wocedata_1/whp/exchange/example_ct1.csv

The ODF format, used by the Canadian Department of Fisheries and Oceans, is described at http://slgo.ca/app-sgdo/en/docs_reference/documents.html, and this was used as a base for read.ctd.odf. However, it was only a starting point, for examination of data files revealed many variants in the names of the data columns. If anything odd happens with ODF files (e.g. if they cannot be plotted), the first thing to do is to reread the files with debug=1, to see if column names were converted properly.

Ice-tethered profile (ITF) data are available at www.whoi.edu/itf; note that the present version only handles data in profiler-mode, not fixed-depth mode.

Examples

Run this code

library(oce)
## Labrador Sea data, file 0001919.tar.gz from website
## http://www.nodc.noaa.gov/cgi-bin/OAS/prd/accession/download
d <- read.ctd.woce("*.csv")
data(coastlineWorld)
plot(coastlineWorld, clat=55, clon=-50, span=5000)
longitude <- sapply(d, function(stn) stn[['longitude']])
latitude <- sapply(d, function(stn) stn[['latitude']])
points(longitude, latitude, col='red')

Run the code above in your browser using DataLab