sim_data() generates a simulated dataset D = L + S + Z for
experimentation with Principal Component Pursuit (PCP) algorithms.
Usage
sim_data(
n = 100,
p = 10,
r = 3,
sparse_nonzero_idxs = NULL,
sigma = 0.05,
seed = 42
)
Value
A list containing:
D: The observed data matrix, where D = L + S + Z.
L: The ground truth rank-r low-rank matrix.
S: The ground truth sparse matrix.
S: The ground truth dense (Gaussian) noise matrix.
Arguments
n, p
(Optional) A pair of integers specifying the simulated dataset's
number of n observations (rows) and p variables (columns). By default,
n = 100, and p = 10.
r
(Optional) An integer specifying the rank of the simulated dataset's
low-rank component. Intuitively, the number of latent patterns governing
the simulated dataset. Must be that r<= min(n, p).
By default, r = 3.
sparse_nonzero_idxs
(Optional) An integer vector with
length(sparse_nonzero_idxs) <= n * p specifying the
indices of the non-zero elements in the sparse component. By default,
sparse_nonzero_idxs = NULL, in which case it is defined to be the
vector seq(1, n * p, n + 1) (placing sparse noise along the diagonal
of the simulated dataset).
sigma
(Optional) A double specifying the standard deviation of the
dense (Gaussian) noise component Z. By default, sigma = 0.05.
seed
(Optional) An integer specifying the seed for random number
generation. By default, seed = 42.
Details
The data is simulated as follows:
L <- matrix(runif(n * r), n, r) %*% matrix(runif(r * p), r, p)