Learn R Programming

sgs (version 0.3.5)

gen_toy_data: Generate toy data.

Description

Generates different types of datasets, which can then be fitted using sparse-group SLOPE.

Usage

gen_toy_data(
  p,
  n,
  rho = 0,
  seed_id = 2,
  grouped = TRUE,
  groups,
  noise_level = 1,
  group_sparsity = 0.1,
  var_sparsity = 0.5,
  orthogonal = FALSE,
  data_mean = 0,
  data_sd = 1,
  signal_mean = 0,
  signal_sd = sqrt(10)
)

Value

A list containing:

y

The response vector.

X

The input matrix.

true_beta

The true values of \(\beta\) used to generate the response.

true_grp_id

Indices of which groups are non-zero in true_beta.

Arguments

p

The number of input variables.

n

The number of observations.

rho

Correlation coefficient. Must be in range \([0,1]\).

seed_id

Seed to be used to generate the data matrix \(X\).

grouped

A logical flag indicating whether grouped data is required.

groups

If grouped=TRUE, the grouping structure is required. Each input variable should have a group id.

noise_level

Defines the level of noise (\(\sigma\)) to be used in generating the response vector \(y\).

group_sparsity

Defines the level of group sparsity. Must be in the range \([0,1]\).

var_sparsity

Defines the level of variable sparsity. Must be in the range \([0,1]\). If grouped=TRUE, this defines the level of sparsity within each group, not globally.

orthogonal

Logical flag as to whether the input matrix should be orthogonal.

data_mean

Defines the mean of input predictors.

data_sd

Defines the standard deviation of the signal (\(\beta\)).

signal_mean

Defines the mean of the signal (\(\beta\)).

signal_sd

Defines the standard deviation of the signal (\(\beta\)).

Details

The data is generated under a Gaussian linear model. The generated data can be grouped and sparsity can be provided at both a group and/or variable level.

Examples

Run this code
# specify a grouping structure
groups = c(rep(1:20, each=3),
          rep(21:40, each=4),
          rep(41:60, each=5),
          rep(61:80, each=6),
          rep(81:100, each=7))
# generate data
data =  gen_toy_data(p=500, n=400, groups = groups, seed_id=3)

Run the code above in your browser using DataLab