Learn R Programming

gRapHD (version 0.1.3)

minForest: Minimum forest

Description

Returns the forest that minimises the -2*log-likelihood, AIC, or BIC.

Usage

minForest(dataset,homog=TRUE,forbEdges=NULL,stat="BIC",...)

Arguments

dataset
matrix or data frame (nrow(dataset) observations and ncol(dataset) variables).
homog
TRUE for homogeneous covariance structure, FALSE for heterogeneous. This is only meaningful with mixed models. Default is homogeneous (TRUE).
forbEdges
list with edges that should not be considered. Matrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Default is NULL.
stat
measure to be minimized: LR (-2*log-likelihood), AIC, or BIC. Default is BIC. It can also be a user defined function with format: FUN(newEdge,varType,numCat, dataset); where the parameters va
...
arguments to be passed to the user function in stat.

Value

  • A list containing:
  • edgesmatrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Column 1 contains the vertex with lower index.
  • pnumber of variables (vertices) in the model.
  • stat.minForestmeasure used (LR, AIC, or BIC).
  • statSeqvector with value of stat.minForest for each edge.
  • vertNames - vector with the original vertices' names.
  • numCatvector with number of levels for each variable (0 if continuous).
  • homogTRUE if the covariance is homogeneous.
  • numPvector with number of estimated parameters for each edge.
  • minForestfirst and last edges found with minForest.

Details

Returns for the tree or forest that minimizes the -2*log-likelihood, AIC, or BIC. If the log-likelihood is used, the result is a tree, if AIC or BIC is used, the result is a tree or forest.The dataset contains variables (vertices) in the columns, and observations in the rows. The result has vertices numbered according to the column indexes in dataset. All discrete variables must be factors. All factor levels must be represented in the data. Missing values are not allowed.

References

Chow, C.K. and Liu, C.N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, Vol. IT-14, 3:462-7. Edwards, D., de Abreu, G.C.G. and Labouriau, R. (2009). High-dimensional Mixed Graphical Models Using Minimal AIC and BIC forests. BMC Bioinformatics. (submitted).

Examples

Run this code
set.seed(7,kind="Mersenne-Twister")
  dataset <- matrix(rnorm(1000),nrow=100,ncol=10)
  m <- minForest(dataset,stat="BIC")

  ##############################################################################
  # Example with continuous variables
  data(dsCont)
  # m1 <- minForest(dataset,varType=0,homog=TRUE,forbEdges=NULL,stat="LR")
  #          1. in this case, there is no use for homog
  #          2. no forbidden edges
  #          3. the measure used is the LR (the result is a tree)
  m1 <- minForest(dsCont,homog=TRUE,forbEdges=NULL,stat="LR")
  plot(m1,numIter=1000)

  ##############################################################################
  # Example with discrete variables
  data(dsDiscr)
  # m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR")
  #          1. in this case, there is no use for homog
  #          2. no forbidden edges
  #          3. the measure used is the LR (the result is a tree)
  m1 <- minForest(dsDiscr,homog=TRUE,forbEdges=NULL,stat="LR")
  plot(m1,numIter=1000)

  ##############################################################################
  # Example with mixed variables
  data(dsMixed)
  # m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR")
  #          1. it is to be considered homogeneous
  #          2. no forbidden edges
  #          3. the measure used is the LR (the result is a tree)
  m1 <- minForest(dsMixed,homog=TRUE,forbEdges=NULL,stat="LR")
  plot(m1,numIter=1000)
  
  ##############################################################################
  # Example using a user defined function
  #   The function userFun calculates the same edges weigths as the option
  # stat="LR". It means that the final result, using either option, is the
  # same.
  userFun <- function(newEdge,numCat,dataset)
  {
    sigma <- var(dataset[,newEdge])
    v <- nrow(dataset)*log(prod(diag(sigma))/det(sigma))
    return(c(v,1))
  }
  
  data(dsCont)
  m <- minForest(dsCont,stat="LR")
  m1 <- minForest(dsCont,stat=userFun)
  identical(m$edges,m1$edges)

Run the code above in your browser using DataLab