
Last chance! 50% off unlimited learning
Sale ends in
Generate two nxp data sets: a training set and a test set, as well as outcome vectors y and yte of length n indicating the class labels of the training and test observations.
CountDataSet(n, p, K, param, sdsignal)
Number of observations desired.
Number of features desired. Note that 30% of the features will differ between classes, though some of those differences may be small.
Number of classes desired. Note that the function requires that n be at least equal to 4K -- i.e. there must be at least 4 observations per class on average.
The dispersion parameter for the negative binomial distribution. The negative binomial distribution is parameterized using "mu" and "size" in the R function "rnbinom". That is, Y ~ NB(mu, param) means that E(Y)=mu and Var(Y) = mu+mu^2/param. So when param is very large this is essentially a Poisson distribution, and when param is smaller then there is a lot of overdispersion relative to the Poisson distribution.
The extent to which the classes are different. If this equals zero then there are no class differences and if this is large then the classes are very different.
nxq data matrix. May have q<p because features with 0 total counts are removed.
class labels for the n observations in x.
nxq data matrix of test observations; the q features are those with >0 total counts in x. So q<=p.
class labels for the n observation in xte.
This is based in part on a function in the DESeq Bioconductor package (Anders and Huber 2010 Genome Biology) for generating a simulated RNA sequencing data set.
# NOT RUN {
set.seed(1)
dat <- CountDataSet(n=20,p=100,sdsignal=2,K=4,param=10)
dd <- PoissonDistance(dat$x,type="mle", transform=TRUE)
# }
Run the code above in your browser using DataLab