Learn R Programming

sortinghat (version 0.1)

simdata_guo: Generates data from K multivariate normal data populations having the covariance structure from Guo et al. (2007).

Description

We generate $n_k$ observations $(k = 1, \ldots, K)$ from each of $K$ multivariate normal distributions. Let the $k$th population have a $p$-dimensional multivariate normal distribution, $N_p(\mu_k, \Sigma_k)$ with mean vector $\mu_k$ and positive-definite covariance matrix $\Sigma_k$. Each covariance matrix $\Sigma_k$ consists of block-diagonal autocorrelation matrices.

Usage

simdata_guo(n, mean, block_size, num_blocks, rho,
    sigma2 = 1, seed = NULL)

Arguments

n
a vector (of length K) of the sample sizes for each population
mean
a vector or a list (of length K) of mean vectors
block_size
a vector (of length K) of the sizes of the square block matrices for each population. See details.
num_blocks
a vector (of length K) giving the number of block matrices for each population. See details.
rho
a vector (of length K) of the values of the autocorrelation parameter for each class covariance matrix
sigma2
a vector (of length K) of the variance coefficients for each class covariance matrix
seed
seed for random number generation (If NULL, does not set seed)

Value

  • named list containing: [object Object],[object Object]

Details

The $k$th class covariance matrix is defined as $$\Sigma_k = \Sigma^{(\rho)} \oplus \Sigma^{(-\rho)} \oplus \ldots \oplus \Sigma^{(\rho)},$$ where $\oplus$ denotes the direct sum and the $(i,j)$th entry of $\Sigma^{(\rho)}$ is $$\Sigma_{ij}^{(\rho)} = { \rho^{|i - j|} }.$$

The matrix $\Sigma^{(\rho)}$ is referred to as a block. Its dimensions are provided in the block_size argument, and the number of blocks are specified in the num_blocks argument.

Each matrix $\Sigma_k$ is generated by the cov_block_autocorrelation function.

The number of populations, K, is determined from the length of the vector of sample sizes, code{n}. The mean vectors can be given in a list of length K. If one mean is given (as a vector or a list having 1 element), then all populations share this common mean.

The block sizes can be given as a numeric vector or a single value, in which case the degrees of freedom is replicated K times. The same logic applies to num_blocks, rho, and sigma2.

For each class, the number of features, p, is computed as block_size * num_blocks. The values for p must agree for each class.

The block-diagonal covariance matrix with autocorrelated blocks was popularized by Guo et al. (2007) for studying classification of high-dimensional data.

References

Guo, Y., Hastie, T., & Tibshirani, R. (2007). "Regularized linear discriminant analysis and its application in microarrays," Biostatistics, 8, 1, 86-100.

Examples

Run this code
# Generates 10 observations from two multivariate normal populations having
# a block-diagonal autocorrelation structure.
block_size <- 3
num_blocks <- 3
p <- block_size * num_blocks
means_list <- list(seq_len(p), -seq_len(p))
data <- simdata_guo(n = c(10, 10), mean = means_list, block_size = block_size,
                    num_blocks = num_blocks, rho = 0.9, seed = 42)
dim(data$x)
table(data$y)

# Generates 15 observations from each of three multivariate normal
# populations having block-diagonal autocorrelation structures. The
# covariance matrices are unequal.
p <- 16
block_size <- c(2, 4, 8)
num_blocks <- p / block_size
rho <- c(0.1, 0.5, 0.9)
sigma2 <- 1:3
mean_list <- list(rep.int(-5, p), rep.int(0, p), rep.int(5, p))

set.seed(42)
data2 <- simdata_guo(n = c(15, 15, 15), mean = mean_list,
                    block_size = block_size, num_blocks = num_blocks,
                    rho = rho, sigma2 = sigma2)
dim(data2$x)
table(data2$y)

Run the code above in your browser using DataLab