Learn R Programming

flars (version 1.0)

data_generation: Data generation function for examples.

Description

This function generates a few types of data with different correlation structures. The generated data can be used in the examples provided in other functions such as the calculation of the functional canonical correlation analysis and the functional least angle regression.

Usage

data_generation(seed,nsamples=80,hyper=NULL,var_type=c('f','m'), cor_type=1:6,uncorr=TRUE,nVar=8)

Arguments

seed
Set the seed for random numbers.
nsamples
Sample size of the data to generate.
hyper
Hyper parameters used in the Gaussian process (GP). GP is used for building the covariance structure of the functional variables.
var_type
Two choices of the variable types. See details for more information.
cor_type
Correlation structures. See details for more information.
uncorr
Whether the variables are built based on linearly uncorrelated variables. See details for more information.
nVar
Number of base variables to generate. Note that this is not the exact number of variables generated at the end.

Value

x
List of covariates.
y
Response variable.
BetaT
True shape of the functional coefficients and true values of the scalar variables.
bConst
Normalizing constants of the functional coefficients. True functional coefficients are the shape times the corresponding normalizing constant.
noise
Random noise.
mu
True intercept.

Details

var_type could be either 'f' or 'm'. If var_type='f', only functional variables will be generated. If var_type='m', both functional variables and scalar variables will be generated.

When uncorr is TRUE, a few linearly uncorrelated variables will be generated. This is to better control the correlation structure of the variables using cor_type. If you want to generated a large number of variables, uncorr should be FALSE.

cor_type are numbers from 1 to 6 or from 1 to 4 depending on the choices of var_type. This is ONLY useful when we use the defaul number of variables, i.e., nVar=8 and the initial variables are linearly uncorrelated, i.e., uncorr=TRUE. Bigger value of cor_type means more complicated correlation structures.

If no correlation restriction is required for the variables, we can use cor_type=1.

nVar is the number of the base variables generated. It is recommaned that users can modify the function to get their own data set. The other way is to use this function repeatedly to get enough both functional and scalar variables. The response variable can be re-generated by the user. Increasing the value of this argument may give NaN for the response variables.

Examples

Run this code
library(flars)
dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120,
      var_type = 'f',cor_type = 1)

Run the code above in your browser using DataLab