# MaximallySelectedStatisticsTests

##### Generalized Maximally Selected Statistics

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

- Keywords
- htest

##### Usage

```
# S3 method for formula
maxstat_test(formula, data, subset = NULL, weights = NULL, …)
# S3 method for table
maxstat_test(object, …)
# S3 method for IndependenceProblem
maxstat_test(object, teststat = c("maximum", "quadratic"),
distribution = c("asymptotic", "approximate", "none"),
minprob = 0.1, maxprob = 1 - minprob, …)
```

##### Arguments

- formula
a formula of the form

`y1 + ... + yq ~ x1 + ... + xp | block`

where`y1`

, …,`yq`

and`x1`

, …,`xp`

are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and`block`

is an optional factor for stratification.- data
an optional data frame containing the variables in the model formula.

- subset
an optional vector specifying a subset of observations to be used. Defaults to

`NULL`

.- weights
an optional formula of the form

`~ w`

defining integer valued case weights for each observation. Defaults to`NULL`

, implying equal weight for all observations.- object
an object inheriting from classes

`"table"`

or`"'>IndependenceProblem"`

.- teststat
a character, the type of test statistic to be applied: either a maximum statistic (

`"maximum"`

, default) or a quadratic form (`"quadratic"`

).- distribution
a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (

`"asymptotic"`

, default) or via Monte Carlo resampling (`"approximate"`

). Alternatively, the functions`asymptotic`

or`approximate`

can be used. Computation of the null distribution can be suppressed by specifying`"none"`

.- minprob
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only greater than the

`minprob`

\(\cdot\) 100% quantile of`x1`

, …,`xp`

are considered. Defaults to`0.1`

.- maxprob
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only smaller than the

`maxprob`

\(\cdot\) 100% quantile of`x1`

, …,`xp`

are considered. Defaults to`1 - minprob`

.- …
further arguments to be passed to

`independence_test`

.

##### Details

`maxstat_test`

provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected \(\chi^2\) statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given
`block`

, between `y1`

, …, `yq`

and `x1`

, …,
`xp`

is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate `x1`

, …,
`xp`

, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.

If both response and covariate is univariable, say `y1`

and `x1`

,
this procedure is known as maximally selected \(\chi^2\) statistics
(Miller and Siegmund, 1982) when `y1`

is a binary factor and `x1`

is
a numeric variable, and as maximally selected rank statistics when `y1`

is a rank transformed numeric variable and `x1`

is a numeric variable
(Lausen and Schumacher, 1992). Lausen *et al.* (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
numeric covariates `x1`

, …, `xp`

.

If, say, `y1`

and/or `x1`

are ordered factors, the default scores,
`1:nlevels(y1)`

and `1:nlevels(x1)`

respectively, can be altered
using the `scores`

argument (see `independence_test`

); this
argument can also be used to coerce nominal factors to class `"ordered"`

.
If both, say, `y1`

and `x1`

are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the `alternative`

argument. The particular
extension to the case of a univariable binary factor response and a
univariable ordered covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain
\(p\)-values and an asymptotic approximation of the exact distribution is
used by default (`distribution = "asymptotic"`

). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
`distribution`

to `"approximate"`

. See `asymptotic`

and
`approximate`

for details.

##### Value

##### Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using `teststat = "maxtype"`

and
`teststat = "quadtype"`

respectively (as was used in versions prior to
0.4-5).

##### References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected
\(\chi^2\) statistics for \(k \times 2\) tables.
*Biometrics* **55**(1), 317--320.
10.1111/j.0006-341X.1999.00317.x

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally
selected rank statistics. *Computational Statistics & Data Analysis*
**43**(2), 121--137. 10.1016/S0167-9473(02)00225-6

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected
statistics. *Biometrics* **64**(4), 1263--1269.
10.1111/j.1541-0420.2008.00995.x

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Assessment of
optimal selected prognostic factors. *Biometrical Journal* **46**(3),
364--374. 10.1002/bimj.200310030

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics.
*Biometrics* **48**(1), 73--85. 10.2307/2532740

Miller, R. and Siegmund, D. (1982). Maximally selected chi square
statistics. *Biometrics* **38**(4), 1011--1016.
10.2307/2529881

M<U+00FC>ller, J. and Hothorn, T. (2004). Maximally selected
two-sample statistics as a new tool for the identification and assessment of
habitat factors with an application to breeding bird communities in oak
forests. *European Journal of Forest Research* **123**(3), 219--228.
10.1007/s10342-004-0035-5

##### Examples

```
# NOT RUN {
## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)
## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates$estimate
## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~ EF, data = hohnloser,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~ SPF, data = sphase,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)
## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
scores = list("Job.Satisfaction" = 1:4,
"Income" = 1:4))
# }
```

*Documentation reproduced from package coin, version 1.3-1, License: GPL-2*