make_2classification: Data Simulation for 2 stages

Description

It gives simulated data. The outcomes are generated based on a pattern mixture model using a latent variable with 4 categories. For each category, X has a multivariate normal distribution and each category is assigned a vector of optimal treatments V. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions.

Then we assign optimal treatments $y=(A_1^*, A_2^*)$ from (1,1),(1,-1),(-1,-1),(-1,1) to each latent category. The observed treatment assignments $A=(A_1,A_2)$ are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: R_1=0, R_2= A'y+N(0,1). Therefore the mean optimal outcome $R_1+R_2$ is $2$ when the treatment assignments are equal to the optimal treatment for a given a latent group in both stages.

Usage

make_2classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

Arguments

n_cluster

number of cluster.

pinfo

number of infomative variables, dimentions of the centroids related to the latent class of the sample.

pnoise

number of noise variable.

n_sample

number of sample to generate

centroids

For a training set, donot assign centroids, this value is generated randomly by the function. For a testing set, one want to assign the same set of centroids as the training set. it is a matrix of dimention n_cluster by p.

Value

XFeature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generate centroids.
AList of 2, A[[1]] and A[[2]] are the treatment assignment vectors for stage 1 and 2.
yList of 2, y[[1]] and y[[2]] are the true optimal treatment for stage 1 and 2
RList of 2, R[[1]] is vector of n_sample zeros, R[[2]] is the final outcomes vector
centroidscenters of each cluster, are from pinfo dimentional multivariate normal distribution.

Examples

Run this code

n_cluster=5
pinfo=10
pnoise=10
n_sample=50
example2=make_2classification(n_cluster,pinfo,pnoise,n_sample)
pi=list()
pi[[2]]=pi[[1]]=rep(1,n_sample)
set.seed(3)
modelO=Olearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelP=Plearning(example2$X,example2$A,example2$R,n_sample,2,pi)
modelQ=Qlearning(example2$X,example2$A,example2$R,2)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples