A Markov random field smooth over a set of discrete areas is defined using a set of area labels, and
a neighbourhood structure for the areas. The covariate of the smooth is the vector of area labels
corresponding to each obervation. This covariate should be a factor, or capable of being coerced to a factor.
The neighbourhood structure is supplied in the xt
argument to s
. This must contain at least one of
the elements polys
, nb
or penalty
.
- polys
contains the polygons defining the geographic areas.
It is a list with as many elements as there are geographic areas.
names(polys)
must correspond to
the levels of the argument of the smooth, in any order (i.e. it gives the area labels).
polys[[i]]
is a 2 column matrix the rows of which specify the vertices of the polygon(s)
defining the boundary of the ith area. A boundary may be made up of several closed loops: these must
be separated by NA
rows. A polygon within another is treated as a hole. The first polygon in
any polys[[i]]
should not be a hole. An example
of the structure is provided by columb.polys
(which contains an artificial hole
in its second element, for illustration). Any list elements with duplicate names are combined into a
single NA separated matrix.
Plotting of the smooth is not possible without a polys
object.
If polys
is the only element of xt
provided, then the neighbourhood structure is
computed from it automatically. To count as neigbours, polygons must exactly share one of more
vertices.
- nb
is a named list defining the neighbourhood structure. names(nb)
must correspond to the
levels of the covariate of the smooth (i.e. the area labels), but can be in any order. nb[[i]]
is a vector indexing the neighbours of the ith area. All indices are relative to nb
itself, but
can be translated using names(nb)
.
If no penalty
is provided then it is computed automatically from this list. The ith row of
the penalty matrix will be zero everwhere, except in the ith column, which will contain the number
of neighbours of the ith geographic area, and the columns corresponding to those geographic
neighbours, which will each contain -1.
- penalty
if this is supplied, then it is used as the penalty matrix. It should be positive semi-definite.
Its row and column names should correspond to the levels of the covariate.
If no basis dimension is supplied then the constructor produces a full rank MRF, with a coefficient for each
geographic area. Otherwise a low rank approximation is obtained based on truncation of the parameterization given in
Wood (2017) Section 5.4.2. See Wood (2017, section 5.8.1).
Note that smooths of this class have a built in plot method, and that the utility function in.out
can be useful for working with discrete area data. The plot method has two schemes, scheme==0
is colour,
scheme==1
is grey scale.
The situation in which there are areas with no data requires special handling. You should set drop.unused.levels=FALSE
in
the model fitting function, gam
, bam
or gamm
, having first ensured that any fixed effect
factors do not contain unobserved levels. Also make sure that the basis dimension is set to ensure that the total number of
coefficients is less than the number of observations.