Learn R Programming

DTRlearn (version 1.0)

make_classification: Data Simulation for single stage

Description

It gives simulated data. The outcomes are generated based on a pattern mixture model using a latent variable with 4 categories. For each category, X has a multivariate normal distribution and each category is assigned a vector of optimal treatments y. Specifically, we generate centroids of the classes from a multivariate normal distribution mean 0 and std 5. We add the centroids to the first pinfo dimension of the vectors of feature variables X simulated from multivariate normal distribution with pinfo+pnoise dimensions.

Then we assign optimal treatments y to each latent category. The observed treatment assignments A are completely random to be 1 and -1 with probability 0.5, and the outcomes are generated as: $R_1=0, R_2= Ay+N(0,1)$.

Usage

make_classification(n_cluster, pinfo, pnoise, n_sample, centroids = 0)

Arguments

n_cluster
number of cluster.
pinfo
number of infomative variables, dimentions of the centroids related to the latent class of the sample.
pnoise
number of noise variable.
n_sample
number of sample to generate
centroids
For a training set, donot assign centroids, this value is generated randomly by the function. For a testing set, one want to assign the same set of centroids as the training set. it is a matrix of dimention n_cluster by p.

Value

  • XThe feature variable matrix, it is a n_sample by pinfo+pnoise matrix generated from multivariate normal distribution. Where the noises are with mean 0 and std 1. The informative variables are shifted to centered at the randomly generate centroids.
  • AThe treatment assignment vector
  • yThe true optimal treatment
  • ROutcomes vector
  • centroidsAre from pinfo dimentional multivariate normal distribution.

References

This function borrows idea from a python comparable function make_classification in scikit_learn http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification

See Also

make_2classification for generating simulation data for 2 stages

Examples

Run this code
n_cluster=10
pinfo=10
pnoise=20
example1=make_classification(n_cluster,pinfo,pnoise,100)
test=make_classification(n_cluster,pinfo,pnoise,100,example1$centroids)
model1=Olearning_Single(example1$X,example1$A,example1$R)
Atp=predict(model1,test$X)
V1=mean(test$R[Atp==test$A])

model2=wsvm(example1$X,example1$A,example1$R,'rbf',0.05)
Atp=predict(model2,test$X)
V2=mean(test$R[Atp==test$A])

Run the code above in your browser using DataLab