Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).

```
grts(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
```

The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:

`sites_legacy`

An sf object containing legacy sites. This is`NULL`

if legacy sites were not included in the sample.`sites_base`

An sf object containing the base sites. This is`NULL`

if`n_base`

equals the number of legacy sites.`sites_over`

An sf object containing the reverse hierarchically ordered replacement sites. This is`NULL`

if no reverse hierarchically ordered replacement sites were included in the sample.`sites_near`

An sf object containing the nearest neighbor replacement sites. This is`NULL`

if no nearest neighbor replacement sites were included in the sample.`design`

A list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.`call`

The original function call.`stratum_var`

The name of the stratification variable in`sframe`

. This equals`NULL`

if no stratification is used.`stratum`

The unique strata. This equals`"None"`

if the sampling design is unstratified.`n_base`

The base sample size per stratum.`seltype`

The selection type per stratum.`caty_var`

The name of the unequal probability variable in`sframe`

. This equals`NULL`

if no unequal probability variable is used.`caty_n`

The expected sample sizes for each level of the unequal probability grouping variable per stratum. This equals`NULL`

when`seltype`

is not`"unequal"`

.`aux_var`

The name of the proportional probability (auxiliary) variable in`sframe`

. This equals`NULL`

if no proportional probability variable is used.`legacy`

A logical variable indicating whether legacy sites were included in the sample.`legacy_stratum_var`

The name of the stratification variable in`legacy_sites`

. Omitted if legacy sites are not used. This equals`NULL`

if legacy sites were used but no stratification variable is used.`legacy_caty_var`

The name of the unequal probability variable in`legacy_sites`

. Omitted if legacy sites are not used. This equals`NULL`

if legacy sites were used but no unequal probability variable is used.`legacy_aux_var`

The name of the proportional probability (auxiliary) variable in`legacy_sites`

. Omitted if legacy sites are not used. This equals`NULL`

if legacy sites were used but no proportional probability variable is used.`mindis`

The minimum distance requirement desired. This is`NULL`

when no minimum distance requirement was applied.`n_over`

The reverse hierarchically ordered replacement site sample sizes per stratum. If`seltype`

is`unequal`

, this represents the expected sample sizes. This is`NULL`

when no reverse hierarchically ordered replacement sites were selected.`n_near`

The number of nearest neighbor replacement sites desired. This is`NULL`

when no nearest neighbor replacement sites were selected.

When non-`NULL`

, the `sites_legacy`

, `sites_base`

,

`sites_over`

, and `sites_near`

objects contain the original columns
in `sframe`

and include a few additional columns. These additional columns
are

`siteID`

A site identifier (as named using the`DesignID`

and`SiteBegin`

arguments to`grts()`

).`siteuse`

Whether the site is a legacy site (`Legacy`

), base site (`Base`

), reverse hierarchically ordered replacement site (`Over`

), or nearest neighbor replacement site (`Near`

).`replsite`

The replacement site ordering.`replsite`

is`None`

if the site is not a replacement site,`Next`

if it is the next reverse hierarchically ordered replacement site to use, or`Near_`

, where the word following`_`

indicates the ordering of sites closest to the originally sampled site.`lon_WGS84`

Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.`lat_WGS84`

Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected.`X`

Longitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).`Y`

Latitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA).`stratum`

A stratum indicator.`stratum`

is`None`

if the sampling design was unstratified. If the sampling design was`stratified`

,`stratum`

indicates the stratum.`wgt`

The design weight.`ip`

The site's original inclusion probability (the reciprocal) of (`wgt`

).`caty`

An unequal probability grouping indicator.`caty`

is`None`

if the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,`caty`

indicates the unequal probability level.`aux`

The auxiliary proportional probability variable. This column is only returned if`seltype`

was`proportional`

in the original sampling design.

If any columns in `sframe`

contain these names, those columns
from `sframe`

will be automatically prefixed with `sframe_`

in the `sites`

object. When output is printed, a summary of site counts by
the levels in `stratum_var`

and `caty_var`

is shown.

- sframe
A sampling frame as an

`sf`

object. The coordinate system for`sframe`

must projected (not geographic). If m or z values are in`sframe`

's geometry, they are silently dropped (i.e., only x-coordinates and y-coordinates are preserved).- n_base
The base sample size required. If the sampling design is unstratified, this is a single numeric value. If the sampling design is stratified, this is a named vector or list whose names represent each stratum and whose values represent each stratum's sample size. These names must match the values of the stratification variable represented by

`stratum_var`

. Legacy sites are considered part of the base sample, so the value for`n_base`

should be equal to the number of legacy sites plus the number of desired non-legacy sites.- stratum_var
A character string containing the name of the column from

`sframe`

that identifies stratum membership for each element in`sframe`

. If stratum equals`NULL`

, the sampling design is unstratified and all elements in`sframe`

are eligible to be selected in the sample. The default is`NULL`

.- seltype
A character string or vector indicating the inclusion probability type, which must be one of following:

`"equal"`

for equal inclusion probabilities;`"unequal"`

for unequal inclusion probabilities according to a categorical variable specified by`caty_var`

; and`"proportional"`

for inclusion probabilities proportional to a positive auxiliary variable specified by`aux_var`

. If the sampling design is unstratified,`seltype`

is a single character vector. If the sampling design is stratified,`seltype`

is a named vector whose names represent each stratum and whose values represent each stratum's inclusion probability type.`seltype`

's default value tries to match the intended inclusion probability type: If`caty_var`

and`aux_var`

are not specified,`seltype`

is`"equal"`

; if`caty_var`

is specified,`seltype`

is`"unequal"`

; and if`aux_var`

is specified,`seltype`

is`"proportional"`

.- caty_var
A character string containing the name of the column from

`sframe`

that represents the unequal probability variable.- caty_n
A character vector indicating the expected sample size for each level of

`caty_var`

, the unequal probability variable. If the sampling design is unstratified,`caty_n`

is a named vector whose names represent each level of`caty_var`

and whose values represent each level's expected sample size. The sum of`caty_n`

must equal`n_base`

. If the sampling design is stratified and the expected sample sizes are the same among strata,`caty_n`

is a named vector whose names represent represent each level of`caty_var`

and whose values represent each level's expected sample size -- these expected sample sizes are applied to all strata. The sum of`caty_n`

must equal each stratum's value in`n_base`

. If the sampling design is stratified and the expected sample sizes differ among strata,`caty_n`

is a list where each element is named as a stratum in`n_base`

. Each stratum's list element is a named vector whose names represent each level of`caty_var`

and whose values represent each level's expected sample size (within the stratum). The sum of the values in each stratum's list element must equal that stratum's value in`n_base`

.- aux_var
A character string containing the name of the column from

`sframe`

that represents the proportional (to size) inclusion probability variable (auxiliary variable). This auxiliary variable must be positive, and the resulting inclusion probabilities are proportional to the values of the auxiliary variable. Larger values of the auxiliary variable result in higher inclusion probabilities.- legacy_var
This argument can be used instead of

`legacy_sites`

when`sframe`

is a`POINT`

or`MULTIPOINT`

geometry (i.e. a finite sampling frame), When`legacy_var`

is used, it is a character string containing the name of the column from`sframe`

that represents whether each site is a legacy site. For legacy sites, the values of the`legacy_var`

must contain character strings that act as a legacy site identifier. For non-legacy sites, the values of the`legacy_var`

column must be`NA`

. Using this approach,`legacy_stratum_var`

,`legacy_caty_var`

, and`legacy_aux_var`

are not required and should not be used (because`legacy_var`

represents a column in`sframe`

).`spsurvey`

assumes that the legacy sites were selected from a previous sampling design that incorporated randomness into site selection and that the legacy sites are elements of the current sampling frame.- legacy_sites
An sf object with a

`POINT`

or`MULTIPOINT`

geometry representing the legacy sites. spsurvey assumes that the legacy sites were selected from a previous sampling design that incorporated randomness into site selection and that the legacy sites are elements of the current sampling frame. If`sframe`

has a`POINT`

or`MULTIPOINT`

geometry, the observations in`legacy_sites`

should not also be in`sframe`

(i.e., duplicates are not removed). Thus,`sframe`

and`legacy_sites`

together compose the current sampling frame. If m or z values are in`legacy_sites`

' geometry, they are silently dropped (i.e., only x-coordinates and y-coordinates are preserved).- legacy_stratum_var
A character string containing the name of the column from

`legacy_sites`

that identifies stratum membership for each element of`legacy_sites`

. This argument is required when the sampling design is stratified and its levels must be contained in the levels of the`stratum_var`

variable. The default value of`legacy_stratum_var`

is`stratum_var`

, so`legacy_stratum_var`

need only be specified explicitly when the name of the stratification variable in`legacy_sites`

differs from`stratum_var`

.- legacy_caty_var
A character string containing the name of the column from

`legacy_sites`

that identifies the unequal probability variable for each element of`legacy_sites`

. This argument is required when the sampling design uses unequal selection probabilities and its categories must be contained in the levels of the`caty_var`

variable. The default value of`legacy_caty_var`

is`caty_var`

, so`legacy_caty_var`

need only be specified explicitly when the name of the unequal probability variable in`legacy_sites`

differs from`caty_var`

.- legacy_aux_var
A character string containing the name of the column from

`legacy_sites`

that identifies the proportional probability variable for each element of`legacy_sites`

. This argument is required when the sampling design uses proportional selection probabilities and the values of the`legacy_aux_var`

variable must be positive. The default value of`legacy_aux_var`

is`aux_var`

, so`legacy_aux_var`

need only be specified explicitly when the name of the proportional probability variable in`legacy_sites`

differs from`aux_var`

.- mindis
A numeric value indicating the desired minimum distance between sampled sites. If the sampling design is stratified and

`mindis`

is an numeric value, the minimum distance is applied to all strata. If the sampling design is stratified and different minimum distances are desired among strata, then`mindis`

is a list whose names match the names of`n_base`

and whose and values are the minimum distance for the corresponding stratum. If a minimum distance is not desired for a particular stratum, then the corresponding value in`mindis`

should be`0`

or`NULL`

(which is equivalent to`0`

). The units of`mindis`

must represent the units in`sframe`

. A warning is returned if the minimum distance could not be reached after`maxtry`

attempts. If legacy sites are used, the minimum distance requirement (and subsequent warning if`maxtry`

attempts are reached) is enforced for all base sites that are not legacy sites (i.e., the minimum distance is enforced for these sites by comparing distances against all base sites (legacy and non-legacy)).- maxtry
The number of maximum attempts to apply the minimum distance algorithm to obtain the desired minimum distance between sites. Each iteration takes roughly as long as the standard GRTS algorithm. Successive iterations will always contain at least as many sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops when the minimum distance requirement is met or there are

`maxtry`

iterations. The default number of maximum iterations is`10`

.- n_over
The number of reverse hierarchically ordered (rho) replacement sites. If the sampling design is unstratified, then

`n_over`

is an integer specifying the number of rho replacement sites desired. If the sampling design is stratified, then`n_over`

is a vector (or list) whose names match the names of`n_base`

and whose values indicate the number of rho replacement sites for each stratum. If replacement sites are not desired for a particular stratum, then the corresponding value in`n_over`

should be`0`

or`NULL`

(which is equivalent to`0`

). If the sampling design is stratified but the number of`n_over`

sites is the same in each stratum,`n_over`

can be a vector which is used for each stratum. Note that if the sampling design has unequal selection probabilities (`seltype = "unequal"`

), then`n_over`

sites are given the same proportion of`caty_n`

values as`n_base`

.- n_near
The number of nearest neighbor (nn) replacement sites. If the sampling design is unstratified,

`n_near`

is integer from`1`

to`10`

specifying the number of nn replacement sites to be selected for each base site. If the sampling design is stratified but the same number of nn replacement sites is desired for each stratum,`n_near`

is integer from`1`

to`10`

specifying the number of nn replacement sites to be selected for each base site. If the sampling design is unstratified and a different number of nn replacement sites is desired for each stratum,`n_near`

is a vector (or list) whose names represent strata and whose values is integer from`1`

to`10`

specifying the number of nn replacement sites to be selected for each base site in the stratum. If replacement sites are not desired for a particular stratum, then the corresponding value in`n_over`

should be`0`

or`NULL`

(which is equivalent to`0`

). For infinite sampling frames, the distance between a site and its nn depends on`pt_density`

. The larger`pt_density`

, the closer the nn neighbors.- wgt_units
The units used to compute the design weights. These units must be standard units as defined by the

`set_units()`

function in the units package. The default units match the units of the sf object.- pt_density
A positive integer controlling the density of the GRTS approximation for infinite sampling frames. The GRTS approximation for infinite sample frames vastly improves computational efficiency by generating many finite points and selecting a sample from the points.

`pt_density`

represents the density of finite points per unit to use in the approximation. More specifically, for each stratum, the number of points used in the approximation equals`pt_density * (n_base + n_over)`

. A larger value of`pt_density`

means a closer approximation to the infinite sampling frame but less computational efficiency. The default value of`pt_density`

is`10`

. Note that when used with`caty_n`

, the unequal inclusion probabilities generated from this approach are also approximations.- DesignID
A character string indicating the naming structure for each site's identifier selected in the sample, which is matched with

`SiteBegin`

and included as a variable in the sf object in the function's output. Default is "Site".- SiteBegin
A character string indicating the first number to use to match with

`DesignID`

while creating each site's identifier selected in the sample. Successive sites are given successive integers. The default starting number is`1`

and the number of digits is equal to number of digits in`nbase + nover`

. For example, if`nbase`

is 50 and`nover`

is 0, then the default site identifiers are`Site-01`

to`Site-50`

- sep
A character string that acts as a separator between

`DesignID`

and`SiteBegin`

. The default is`"-"`

.- projcrs_check
A check for whether the coordinates are projected. If

`TRUE`

, an error is returned if coordinates are not projected (i.e., they are geographic or NA). If`FALSE`

, the check is not performed, which means that the crs in`sframe`

(and`legacy_sites`

if provided) can be projected, geographic, or NA.

Tony Olsen olsen.tony@epa.gov

`n_base`

is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, `n_base`

is typically the
number of sites in all panels that will be sampled in the same temporal period --
`n_base`

is not the total number of sites in all panels. The sum of `n_base`

and
`n_over`

is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.

Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling
of natural resources. *Journal of the American Statistical Association*, 99(465), 262-278.

`irs`

to select a sample that is not spatially balanced

```
if (FALSE) {
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
}
```

Run the code above in your browser using DataCamp Workspace