Learn R Programming

NeEDS4BigData (version 1.0.1)

GenGLMdata: Generate data for Generalised Linear Models

Description

Function to simulate big data under linear, logistic and Poisson regression for sampling. Covariate data X is through Normal, Multivariate Normal or Uniform distribution for linear regression. Covariate data X is through Exponential, Normal, Multivariate Normal or Uniform distribution for logistic regression. Covariate data X is through Normal or Uniform distribution for Poisson regression.

Usage

GenGLMdata(Dist,Dist_Par,No_Of_Var,Beta,N,family)

Value

The output of GenGLMdata gives a list of

Complete_Data a matrix for Y and X

Arguments

Dist

a character value for the distribution "Normal", "MVNormal", "Uniform or "Exponential"

Dist_Par

a list of parameters for the distribution that would generate data for covariate X

No_Of_Var

number of variables

Beta

a vector for the model parameters, including the intercept

N

the big data size

family

a character vector for "linear", "logistic" and "poisson" regression from Generalised Linear Models

Details

Big data for the Generalised Linear Models are generated by the "linear", "logistic" and "poisson" regression types.

We have limited the covariate data generation for linear regression through normal, multivariate normal and uniform distribution, logistic regression through exponential, normal, multivariate normal and uniform distribution Poisson regression through normal and uniform distribution.

References

lee1996hierarchicalNeEDS4BigData

Examples

Run this code
No_Of_Var<-2; Beta<-c(-1,2,1); N<-5000;

Dist<-"MVNormal";
Dist_Par<-list(Mean=rep(0,No_Of_Var),Variance=diag(rep(2,No_Of_Var)),Error_Variance=0.5)
Family<-"linear"
Results<-GenGLMdata(Dist,Dist_Par,No_Of_Var,Beta,N,Family)

Dist<-"Normal"; Dist_Par<-list(Mean=0,Variance=1);
Family<-"logistic"
Results<-GenGLMdata(Dist,Dist_Par,No_Of_Var,Beta,N,Family)

Dist<-"Uniform"; Family<-"poisson"
Results<-GenGLMdata(Dist,NULL,No_Of_Var,Beta,N,Family)

Run the code above in your browser using DataLab