Learn R Programming

dual.spls (version 0.1.4)

d.spls.simulate: Simulation of a data

Description

The function d.spls.simulate simulates G mixtures of nondes Gaussians from which it builds a data set of predictors X and response y in a way that X can be divided into G groups and the values of y depend on the values of X.

Usage

d.spls.simulate(n=200,p=100,nondes=50,sigmaondes=0.05,sigmay=0.5,int.coef=1:5)

Value

A list of the following attributes

X

the concatenated predictors matrix.

y

the response vector.

y0

the response vector without noise sigmay.

sigmay

the uncertainty on y.

sigmaondes

the standard deviation of the Gaussians.

G

the number of groups.

Arguments

n

a positive integer. n is the number of observations. Default value is 200.

p

a numeric vector of length G representing the number of variables. Default value is 100.

nondes

a numeric vector of length G. nondes is the number of Guassians in each mixture. Default value is 50.

sigmaondes

a numeric vector of length G. sigmaondes is the standard deviation of the Gaussians for each group \(g\). Default value is 0.05.

sigmay

a real value. sigmay is the uncertainty on y. Default value is 0.5.

int.coef

a numeric vector of the coefficients of the linear combination in the construction of the response vector y.

Author

Louna Alsouki François Wahl

Details

The predictors matrix X is a concatenations of G predictors sub matrices. Each is computed using a mixture of Gaussian i.e. summing the following Gaussians: $$A \exp{(-\frac{(\textrm{xech}-\mu)^2}{2 \sigma^2})}.$$ Where

  • \(A\) is a numeric vector of random values between 0 and 1,

  • xech is an element from the sequence of \(p(g)\) equally spaced values from 0 to 1. \(p(g)\) is the number of variables of the sub matrix \(g\), for \(g \in \{1, \dots, G\}\),

  • \(\mu\) is a random value in \([0,1]\) representing the mean of the Gaussians,

  • \(\sigma\) is a positive real value specified by the user and representing the standard deviation of the Gaussians.

The response vector y is a linear combination of the predictors to which we add a noise of uncertainty sigmay. It is computed as follows:

$$y_i= \sigma_y \times V_i +\sum_{g=1}^G \sum_{k=1}^K \textrm{int.coeff}_k \times \textrm{sum}X^{g}_{ik}$$ Where

  • \(G\) is the number of predictor sub matrices,

  • \(i\) is the index of the observation,

  • \(V\) is a normally distributed vector of 0 mean and unitary standard deviation,

  • \(K\) is the length of the vector int.coeff,

  • \(\textrm{sum}X^{g}\) is a matrix of \(n\) rows and \(K\) columns. The values of the column \(k\) are the sum of selected parts of each row of the sub matrix \(X^g\). The columns of \(X^g\) are separated equally and each part is used for the \(K\) columns of \(\textrm{sum}X^{g}\).

Examples

Run this code
### load dual.spls library
library(dual.spls)
####one predictors matrix
### parameters
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data1=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

Xa <- data1$X
ya <- data1$y

###plotting the data
plot(Xa[1,],type='l',ylim=c(0,max(Xa)),main='Data', ylab='Xa',col=1)
for (i in 2:n){ lines(Xa[i,],col=i) }

####two predictors matrix
### parameters
n <- 100
p <- c(50,100)
nondes <- c(20,30)
sigmaondes <- c(0.05,0.02)
data2=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

Xb <- data2$X
X1 <- Xb[,(1:p[1])]
X2 <- Xb[,(p[1]+1):(p[1]+p[2])]
yb <- data2$y

###plotting the data
plot(Xb[1,],type='l',ylim=c(0,max(Xb)),main='Data', ylab='Xb',col=1)
for (i in 2:n){ lines(Xb[i,],col=i) }

###plotting the data
plot(X1[1,],type='l',ylim=c(0,max(X1)),main='Data X1', ylab='X1',col=1)
for (i in 2:n){ lines(X1[i,],col=i) }

###plotting the data
plot(X2[1,],type='l',ylim=c(0,max(X2)),main='Data X2', ylab='X2',col=1)
for (i in 2:n){ lines(X2[i,],col=i) }

Run the code above in your browser using DataLab