data_generation: Data generation function for examples.

Description

This function generates a few types of data with different correlation structures. The generated data can be used in the examples provided in other functions such as the calculation of the functional canonical correlation analysis and the functional least angle regression.

Usage

data_generation(seed,nsamples=80,hyper=NULL,var_type=c('f','m'), cor_type=1:6,uncorr=TRUE,nVar=8)

Arguments

seed

Set the seed for random numbers.

nsamples

Sample size of the data to generate.

hyper

Hyper parameters used in the Gaussian process (GP). GP is used for building the covariance structure of the functional variables.

var_type

Two choices of the variable types. See details for more information.

cor_type

Correlation structures. See details for more information.

uncorr

Whether the variables are built based on linearly uncorrelated variables. See details for more information.

nVar

Number of base variables to generate. Note that this is not the exact number of variables generated at the end.

Value

x: List of covariates.
y: Response variable.
BetaT: True shape of the functional coefficients and true values of the scalar variables.
bConst: Normalizing constants of the functional coefficients. True functional coefficients are the shape times the corresponding normalizing constant.
noise: Random noise.
mu: True intercept.

Details

var_type could be either 'f' or 'm'. If var_type='f', only functional variables will be generated. If var_type='m', both functional variables and scalar variables will be generated.

When uncorr is TRUE, a few linearly uncorrelated variables will be generated. This is to better control the correlation structure of the variables using cor_type. If you want to generated a large number of variables, uncorr should be FALSE.

cor_type are numbers from 1 to 6 or from 1 to 4 depending on the choices of var_type. This is ONLY useful when we use the defaul number of variables, i.e., nVar=8 and the initial variables are linearly uncorrelated, i.e., uncorr=TRUE. Bigger value of cor_type means more complicated correlation structures.

If no correlation restriction is required for the variables, we can use cor_type=1.

nVar is the number of the base variables generated. It is recommaned that users can modify the function to get their own data set. The other way is to use this function repeatedly to get enough both functional and scalar variables. The response variable can be re-generated by the user. Increasing the value of this argument may give NaN for the response variables.

Examples

Run this code

library(flars)
dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120,
      var_type = 'f',cor_type = 1)

Run the code above in your browser using DataLab