Learn R Programming

huge (version 1.0.3)

huge.generator: Data generator

Description

Implements the data generation from multivariate normal distributions with different graph structures, including "random", "hub", "cluster" and "band".

Usage

huge.generator(n = 200, d = 50, graph = "random", v = NULL, u = NULL, 
g = NULL, prob = NULL, vis = FALSE, verbose = TRUE)

Arguments

n
The number of observations (sample size). The default value is 200.
d
The number of variables (dimension). The default value is 50.
graph
The graph structure with 4 options: "random", "hub", "cluster" and "band".
v
The off-diagonal elements of the precision matrix, controlling the magnitude of partial correlations with u. The default value is 0.3.
u
A positive number being added to the diagonal elements of the precision matrix, to control the magnitude of partial correlations. The default value is 0.1.
g
For "cluster" or "hub" graph, g is the number of hubs or clusters in the graph. The default value is about d/20 if d >= 40 and 2 if d < 40. For "band"
prob
For "random" graph, it is the probability that a pair of nodes has an edge. The default value is 3/d. For "cluster" graph, it is the probability that a pair of nodes has an edge in each cluster. The default value is
vis
Visualize the adjacency matrix of the true graph structure, the graph pattern, the covariance matrix and the empirical covariance matrix. The default value is FALSE
verbose
If verbose = FALSE, tracing information printing is disabled. The default value is TRUE.

Value

  • An object with S3 class "sim" is returned:
  • dataThe n by d matrix for the generated data
  • sigmaThe covariance matrix for the generated data
  • omegaThe precision matrix for the generated data
  • sigmahatThe empirical covariance matrix for the generated data
  • thetaThe adjacency matrix of true graph structure (in sparse matrix representation) for the generated data

Details

Given the adjacency matrix theta, the graph patterns are generated as below: (I) "random": Each pair of off-diagonal elements are randomly set theta[i,j]=theta[j,i]=1 for i!=j with probability prob, and 0 other wise. It results in about d*(d-1)*prob/2 edges in the graph. (II)"hub":The row/columns are evenly partitioned into g disjoint groups. Each group is associated with a "center" row i in that group. Each pair of off-diagonal elements are set theta[i,j]=theta[j,i]=1 for i!=j if j also belongs to the same group as i and 0 otherwise. It results in d - g edges in the graph. (III)"cluster":The row/columns are evenly partitioned into g disjoint groups. Each pair of off-diagonal elements are set theta[i,j]=theta[j,i]=1 for i!=j with the probability probif both i and j belong to the same group, and 0 other wise. It results in about g*(d/g)*(d/g-1)*prob/2 edges in the graph. (IV)"band": The off-diagonal elements are set to be theta[i,j]=1 if 1<=|i-j|<=g< code=""> and 0 other wise. It results in (2d-1-g)*g/2 edges in the graph. The adjacency matrix theta has all diagonal elements equal to 0. To obtain a positive definite precision matrix, the smallest eigenvalue of theta*v is computed. Suppose e be the smallest eigenvalue and we let the precision matrix equals theta*v+(|e|+0.1+t)I. The covariance matrix is then computed to generate multivariate normal data.

References

1.Tuo Zhao and Han Liu. HUGE: A Package for High-dimensional Undirected Graph Estimation. Technical Report, Carnegie Mellon University, 2010 2.Jerome Friedman, Trevor Hastie and Robert Tibshirani. Applications of the lasso and grouped lasso to the estimation of sparse graphical models, Technical Report, Stanford University, 2010

See Also

huge and huge-package

Examples

Run this code
## band graph with bandwidth 3
L = huge.generator(graph = "band", g = 3)
plot(L)

## random sparse graph
L = huge.generator(vis = TRUE)

## random dense graph
L = huge.generator(prob = 0.5, vis = TRUE)

## hub graph with 6 hubs
L = huge.generator(graph = "hub", g = 6, vis = TRUE)

## hub graph with 8 clusters
L = huge.generator(graph = "cluster", g = 8, vis = TRUE)

Run the code above in your browser using DataLab