Learn R Programming

sortinghat (version 0.1)

simdata_t: Generates random variates from K multivariate Student's t populations.

Description

We generate $n_k$ observations $(k = 1, \ldots, K_0)$ from each of $K_0$ multivariate Student's t distributions such that the Euclidean distance between each of the means and the origin is equal and scaled by $\Delta \ge 0$.

Usage

simdata_t(n, centroid, cov, df, seed = NULL)

Arguments

n
a vector (of length K) of the sample sizes for each population
centroid
a vector or a list (of length K) of centroid vectors
cov
a symmetric matrix or a list (of length K) of symmetric covariance matrices.
df
a vector (of length K) of the degrees of freedom for each population
seed
seed for random number generation (If NULL, does not set seed) [object Object],[object Object]

Details

Let $\Pi_k$ denote the $k$th population with a $p$-dimensional multivariate Student's t distribution, $T_p(\mu_k, \Sigma_k, c_k)$, where $\mu_k$ is the population location vector, $\Sigma_k$ is the positive-definite covariance matrix, and $c_k$ is the degrees of freedom.

For small values of $c_k$, the tails are heavier, and, therefore, the average number of outlying observations is increased.

The number of populations, K, is determined from the length of the vector of sample sizes, code{n}. The centroid vectors and covariance matrices each can be given in a list of length K. If one covariance matrix is given (as a matrix or a list having 1 element), then all populations share this common covariance matrix. The same logic applies to population centroids. The degrees of freedom can be given as a numeric vector or a single value, in which case the degrees of freedom is replicated K times.

Examples

Run this code
# Generates 10 observations from each of two multivariate t populations
# with equal covariance matrices and equal degrees of freedom.
centroid_list <- list(c(3, 0), c(0, 3))
cov_identity <- diag(2)
data_generated <- simdata_t(n = c(10, 10), centroid = centroid_list,
                            cov = cov_identity, df = 4, seed = 42)
dim(data_generated$x)
table(data_generated$y)

# Generates 10 observations from each of three multivariate t populations
# with unequal covariance matrices and unequal degrees of freedom.
set.seed(42)
centroid_list <- list(c(-3, -3), c(0, 0), c(3, 3))
cov_list <- list(cov_identity, 2 * cov_identity, 3 * cov_identity)
data_generated2 <- simdata_t(n = c(10, 10, 10), centroid = centroid_list,
                             cov = cov_list, df = c(4, 6, 10))
dim(data_generated2$x)
table(data_generated2$y)

Run the code above in your browser using DataLab