makeTPPSplineMats.data.frame: Make the spline basis matrices and data needed to fit Tensor Product P-splines.

Description

Prepares the fixed and random P-spline basis matrices, and associated information, that are needed for fitting of Tensor Product P-splines (TPPS) as described by Rodriguez-Alvarez et al. (2018). This function is called internally by addSpatialModelOnIC.asrtests, addSpatialModelOnIC.asrtests and chooseSpatialModelOnIC.asrtests when fitting TPPS models for local spatial variation. There are two methods available, controlled by asreml.option for creating and storing the basis functions. This function is most likely to be called directly when mbf has been used in creating an asreml.object and it is desired to use the object in a session subsequent to the session in which the object was created.

Usage

# S3 method for data.frame
makeTPPSplineMats(data, sections = NULL, 
                  row.covar, col.covar, 
                  nsegs = NULL, nestorder = c(1,1), 
                  degree = c(3,3), difforder = c(2,2),
                  rotateX = FALSE, theta = c(0,0), 
                  asreml.option = "grp", mbf.env = sys.frame(), 
                  ...)

Value

A list of length equal to the number of sections is produced. Each of these components is a list with 8 or 9 components. The component named data.plus, being the input data.frame to which has been added the columns required to fit the TPPS model (the data.frame stored in the data component holds only the covariates from data).

List of length 8 or 9 (according to the asreml.option).

data = the input data frame augmented with structures required to fit tensor product splines in asreml-R. This data frame can be used to fit the TPS model.

Added columns:
- TP.col, TP.row = column and row coordinates
- TP.CxR = combined index for use with smooth x smooth term
- TP.C.n for n=1:diff.c = X parts of column spline for use in random model (where diff.c is the order of column differencing)
- TP.R.n for n=1:diff.r = X parts of row spline for use in random model (where diff.r is the order of row differencing)
- TP.CR.n for n=1:(diff.c*diff.r) = interaction between the two X parts for use in fixed model. The first variate is a constant term which should be omitted from the model when the constant (1) is present. If all elements are included in the model then the constant term should be omitted,eg. y ~ -1 + TP.CR.1 + TP.CR.2 + TP.CR.3 + TP.CR.4 + other terms...
- when asreml="grp" or "sepgrp", the spline basis functions are also added into the data frame. Column numbers for each term are given in the grp list structure.
mbflist = list that can be used in call to asreml (so long as Z matrix data frames extracted with right names, eg BcZ<stub>.df)
BcZ.df = mbf data frame mapping onto smooth part of column spline, last column (labelled TP.col) gives column index
BrZ.df = mbf data frame mapping onto smooth part of row spline, last column (labelled TP.row) gives row index
BcrZ.df = mbf data frame mapping onto smooth x smooth term, last column (labelled TP.CxR) maps onto col x row combined index
dim = list structure, holding dimension values relating to the model:
1. "diff.c" = order of differencing used in column dimension
2. "nbc" = number of random basis functions in column dimension
3. "nbcn" = number of nested random basis functions in column dimension used in smooth x smooth term
4. "diff.r" = order of differencing used in column dimension
5. "nbr" = number of random basis functions in column dimension
6. "nbrn" = number of nested random basis functions in column dimension used in smooth x smooth term
trace = list of trace values for ZGZ' for the random TPspline terms, where Z is the design matrix and G is the known diagonal variance matrix derived from eigenvalues. This can be used to rescale the spline design matrix (or equivalently variance components).
grp = list structure, only added for setting asreml="grp". For asreml="grp", provides column indexes for each of the 5 random components of the 2D splines in data.plus. Dimensions of the components can be derived from the values in the dim item.
data.plus = the input data.frame to which has been added the columns required to fit tensor product splines in asreml-R. This data.frame can be used to fit the TPS model. FOr multiple sections, this data.frame will occur in the component for each section. If asreml.option is set to mbf, then this component will have the attribute mbf.env that specifies the environment to which the data.frames containing the spline bases have been assigned.

Arguments

data: An data.frame that holds the spline bases for a section. It is indexed by columns named col and row.
sections: A single character string that species the name of the column in the data.frame that contains the factor that identifies different sections of the data to which separate spatial models are to be fitted.
row.covar: A single character string nominating a numeric column in the data.frame that contains the values of a covariate indexing the rows of the grid.
col.covar: A single character string nominating a numeric column in the data.frame that contains the values of a covariate indexing the columns of the grid.
nsegs: A pair of numeric values giving the number of segments into which the column and row ranges are to be split, respectively, for fitting a P-spline model (TPPS) (each value specifies the number of internal knots + 1). If not specified, then (number of unique values - 1) is used in each dimension; for a grid layout with equal spacing, this gives a knot at each data value. If sections is not NULL and the grid differs between the sections, then nsegs will differ between the sections.
nestorder: A numeric of length 2. The order of nesting for column and row dimensions, respectively, in fitting a P-spline model (TPPS). A value of 1 specifies no nesting, a value of 2 generates a spline with half the number of segments in that dimension, etc. The number of segments in each direction must be a multiple of the order of nesting.
degree: A numeric of length 2. The degree of polynomial spline to be used for column and row dimensions respectively, in fitting a P-spline (TPPS).
difforder: A numeric of length 2. The order of differencing for column and row dimensions, respectively, in fitting a P-spline (TPPS).
rotateX: A logical indicating whether to rotate the eigenvectors of the penalty matrix, as described by Piepho, Boer and Williams (2022), when fitting a P-spline (TPPS). Setting rotateX to TRUE results in a search for an optimized rotation under a model that omits the random spline interaction terms. If ngridangles is set to NULL, the optimal rotation us found using an optimizer (nloptr::bobyqa). Otherwise, the optimal rotation is found by exploring the fit over a two-dimensional grid of rotation angle pairs. The optimization seeks to optimize the criterion nominated in which.rotacriterion. Rotation of the eigenvectors is only relevant for difforder values greater than 1 and has only been implemented for difforder equal to 2.
theta: A numeric of length 2. The angle (in degrees) to be used in rotating the eignevectors of the penalty matrix of a P-spline (TPPS).
asreml.option: A single character string specifying whether the grp or mbf methods are to be used to supply externally formed covariate matrices to asreml when fitting a P-spline (TPPS). Compared to the mbf method, the grp method is somewhat faster, but creates large asrtests.objects for which the time it takes to save them can exceed any gains in execution speed. The grp method adds columns to the data.frame containing the data. On the other hand, the mbf method adds only the fixed covariates to data and stores the random covariates in the environment of the internal function that calls the spline-fitting function; there are three smaller data.frames for each section that are not stored in the asreml.object resulting from the fitted model.
mbf.env: A environment specifying the environment to which the data.frames containing the spline bases are to be assigned. If mbf.env is NULL, the data.frames will not be assigned.
...: Further arguments passed to tpsmmb from package TPSbits.

Author

Chris Brien

Details

The objects are formed using the function tpsmmb from the R package TPSbits authored by Sue Welham (2022). This function has been extended to allow for sections (see below) and to allow rotation of the penalty matrix for the linear component of the interaction terms in TPPCS models (for more information about rotation see Piepho, Boer and Williams, 2022).

Each combination of a row.covar and a col.covar does not have to specify a single observation; for example, to fit a local spatial variation model to the main units of a split-unit design, each combination would correspond to a main unit and all subunits of the main unit would have the same combination.

The data for experiment can be divided sections and the spline bases and associated data will be produced for each section. If there is more than one sections, then a list is returned that has a component for each section. The component for a section contains:

References

Piepho, H-P, Boer, M. P. & Williams, E. R. (2022) Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials. Biometrical Journal, 64, 835-857.)

Rodriguez-Alvarez, M. X., Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52-71.

Welham, S. J. (2022) TPSbits: Creates Structures to Enable Fitting and Examination of 2D Tensor-Product Splines using ASReml-R. Version 1.0.0 https://mmade.org/tpsbits/

Examples

Run this code

if (FALSE) {

data(Wheat.dat)

#Add row and column covariates
Wheat.dat <- within(Wheat.dat, 
                    {
                      cColumn <- dae::as.numfac(Column)
                      cColumn <- cColumn  - mean(unique(cColumn))
                      cRow <- dae::as.numfac(Row)
                      cRow <- cRow - mean(unique(cRow))
                    })

#Set up the matrices
tps.XZmat <- makeTPPSplineMats(wheat.dat, 
                                row.covar = "cRow", col.covar = "cColumn")
}

Run the code above in your browser using DataLab