Transformed counts of faecal coliform bacteria in sea water at seven locations: Longreef, Bondi East, Port Hacking ``50'', and Port Hacking ``100'' (controls) and Bondi Offshore, Malabar Offshore and North Head Offshore (outfalls). At each location measurements were made at four depths: 0, 20, 40, and 60 meters.
The data sets are named SydColCount and SydColDisc.
Data frames with 5432 observations on the following 6 variables.
yTransformed measures of the number of faecal coliform count bacteria in a sea-water sample of some specified volume. The original measures were obtained by a repeated dilution process.
For SydColCount the transformation used was essentially
a square root transformation, resulting values greater than 150
being set to NA. The results are putatively compatible
with a Poisson model for the emission probabilities.
For SydColDisc the data were discretised
using the cut() function with breaks given
by c(0,1,5,25,200,Inf) and labels equal to
c("lo","mlo","m","mhi","hi").
Note that in the SydColDisc data there are 180 fewer missing values (NAs) in the y column than in the SydColCount data. This is because in forming the SydColCount data (transforming the original data to a putative Poisson distribution) values that were greater than 150 were set equal to NA, and there were 180 such values.
locna factor with levels “LngRf” (Longreef), “BondiE” (Bondi East), “PH50” (Port Hacking 50), “PH100” (Port Hacking 100), “BondiOff” (Bondi Offshore), “MlbrOff” (Malabar Offshore) and “NthHdOff” (North Head Offshore)
deptha factor with levels “0” (0 metres), “20” (20 metres), “40” (40 metres) and “60” (60 metres).
ma.comA factor with levels no and yes,
indicating whether the Malabar sewage outfall had been commissioned.
nh.comA factor with levels no and yes,
indicating whether the North Head sewage outfall had been commissioned.
bo.comA factor with levels no and yes,
indicating whether the Bondi Offshore sewage outfall had been commissioned.
The observations corresponding to each location-depth combination constitute a time series. The sampling interval is ostensibly 1 week; distinct time series are ostensibly synchronous. The measurements were made over a 194 week period. See Turner et al. (1998) for more detail.
T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson. Hidden Markov chains in generalized linear models. Canadian J. Statist., vol. 26, pp. 107 -- 125, 1998.
Rolf Turner. Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 -- 4160, 2008, doi:10.1016/j.csda.2008.01.029.
# Select out a subset of four locations:
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn %in% loc4,]
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)
Run the code above in your browser using DataLab