Learn R Programming

Ball (version 1.3.7)

bd.test: Ball Divergence based Equality of Distributions Test

Description

Performs the nonparametric two-sample or \(K\)-sample ball divergence test for equality of multivariate distributions

Usage

bd.test(x, ...)

# S3 method for default bd.test(x, y = NULL, num.permutations = 99, distance = FALSE, size = NULL, seed = 4, num.threads = 1, kbd.type = "sum", ...)

# S3 method for formula bd.test(formula, data, subset, na.action, ...)

Arguments

x

a numeric vector, matrix, data.frame, dist object or list containing vector, matrix, or data.frame.

...

further arguments to be passed to or from methods.

y

a numeric vector, matrix or data.frame.

num.permutations

the number of permutation replications, when num.permutations equals to 0, the function returns the sample version of ball divergence. Default: num.permutations = 99

distance

if distance = TRUE, x will be considered as a distance matrix. Default: distance = FALSE

size

a vector record sample size of each group.

seed

the random seed.

num.threads

Number of threads. Default num.threads = 1.

kbd.type

a character value controlling the output information. Setting kdb.type = "sum", kdb.type = "summax", or kdb.type = "max", the corresponding statistics value and \(p\)-value of \(K\)-sample test procedure are demonstrated. Note that this arguments actually only influences the printed result in R console. Default: kdb.type = "sum"

formula

a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Value

bd.test returns a list with class "htest" containing the following components:

statistic

ball divergence statistic.

p.value

the p-value for the test.

replicates

permutation replications of the test statistic.

size

sample sizes.

complete.info

a list containing multiple statistics value and their corresponding $p$ value.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating what type of test was performed.

data.name

description of data.

Details

bd.test are ball divergence based multivariate nonparametric tests of two-sample or K-sample problem. If only x is given, the statistic is computed from the original pooled samples, stacked in matrix where each row is a multivariate observation, or from the distance matrix when distance = TRUE. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc. If x is a list, its elements are taken as the samples to be compared, and hence have to be numeric data vectors, matrix or data.frame.

Based on sample version ball divergence (see bd), the test is implemented by permutation with num.permutations times. The function simply returns the test statistic when num.permutations = 0.

References

Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping. Ball Divergence: Nonparametric two sample test. Ann. Statist. 46 (2018), no. 3, 1109--1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077

Jin, Zhu, Wenliang Pan, Wei Zheng, and Xueqin Wang (2018). Ball: An R package for detecting distribution difference and association in metric spaces. arXiv preprint arXiv:1811.03750. URL http://arxiv.org/abs/1811.03750.

See Also

bd

Examples

Run this code
# NOT RUN {
################# Quick Start #################
x <- rnorm(50)
y <- rnorm(50, mean = 1)
# plot(density(x))
# lines(density(y), col = "red")
# ball divergence:
bd.test(x = x, y = y)

################# Quick Start #################
x <- matrix(rnorm(100), nrow = 50, ncol = 2)
y <- matrix(rnorm(100, mean = 3), nrow = 50, ncol = 2)
# Hypothesis test with Standard Ball Divergence:
bd.test(x = x, y = y)

################# Simlated Non-Hilbert data #################
data("bdvmf")
# }
# NOT RUN {
library(scatterplot3d)
scatterplot3d(bdvmf[["x"]], color = bdvmf[["group"]], 
              xlab = "X1", ylab = "X2", zlab = "X3")
# }
# NOT RUN {
# calculate geodesic distance between sample:
Dmat <- nhdist(bdvmf[["x"]], method = "geodesic")
# hypothesis test with BD :
bd.test(x = Dmat, size = c(150, 150), num.permutations = 99, distance = TRUE)

################# Non-Hilbert Real Data #################
# load data:
data("macaques")
# number of femala and male Macaca fascicularis:
table(macaques[["group"]])
# calculate Riemannian shape distance matrix:
Dmat <- nhdist(macaques[["x"]], method = "riemann")
# hypothesis test with BD:
bd.test(x = Dmat, num.permutations = 99, size = c(9, 9), distance = TRUE)

################  K-sample Test  #################
n <- 150
bd.test(rnorm(n), size = c(40, 50, 60))
# alternative input method:
x <- lapply(c(40, 50, 60), rnorm)
bd.test(x)

################  Formula interface  ################
## Two-sample test
bd.test(extra ~ group, data = sleep)
## K-sample test
bd.test(Sepal.Width ~ Species, data = iris)
# }

Run the code above in your browser using DataLab