Learn R Programming

catnet (version 1.00.0)

cnSearchSA: Stochastic Network Search

Description

This function provides a MLE based network search in the space of node orders by Simulated Annealing. For a given sample from an unknown categorical network, it returns a list of catNetwork objects, with complexity up to some maximal value, that best fit the data.

Usage

cnSearchSA(data, perturbations, 
	maxParentSet, maxComplexity=0, 
	parentsPool=NULL, fixedParentsPool=NULL, 
	selectMode = "BIC", 
	tempStart=1, tempCoolFact=0.9, tempCheckOrders=10, maxIter=100, 
	orderShuffles=1, stopDiff=0, priorSearch = NULL, echo=FALSE)

Arguments

data
a matrix in row-nodes format or a data.frame in column-nodes format
perturbations
a binary matrix with the dimensions of data. A value 1 designates the node in the corresponding sample as perturbed
maxParentSet
an integer, maximal number of parents per node
maxComplexity
an integer, maximal network complexity for the search
parentsPool
a list of parent sets to choose from
fixedParentsPool
a list of parent sets to choose from
selectMode
a character, optimization network selection criterion such as "AIC" and "BIC"
tempStart
a numerical value, the initial temperature for the annealing
tempCoolFact
a numerical value, the temperature multiplicative decreasing factor
tempCheckOrders
an integer, the number of iteration, orders to be searched, with constant temperature
maxIter
an integer, the total number of iterations, thus orders, to be processed
orderShuffles
an integer, the number of order shuffles per iteration with 0 indicating random order at each iteration
stopDiff
a numerical value, stopping epsilon criterion
priorSearch
a catNetworkEvaluate object from a previous search
echo
a logical that sets on/off some functional progress and debug information

Value

  • A catNetworkEvaluate object.

Details

The data can be a matrix of character categories with rows specifying the node-variables and columns assumed to be independent samples from an unknown network, or a data.frame with columns specifying the nodes and rows being the samples.

The number of categories for each node is obtained from the data. It is the user responsibility to make sure the data can be categorized reasonably. If the data is numerical it will be forcibly coerced to integer one, which however may result to NA entries or too many node categories, and in both cases the function will fail.

The function returns a list of networks, one for any possible complexity within the specified range. Stochastic optimization, based on the criterion of maximizing the likelihood, is carried on the network with complexity closest to, but not above, maxComplexity. If maxComplexity is not specified, thus the function is called with the default zero value, then maxComplexity is set to be the complexity of a network with all nodes having the maximum, maxParentSet, the number of parents. The selectMode parameter sets the selection criterion for the network upon which the maximum likelihood optimization is carried on. "BIC" is the default choice, while any value different from "AIC" and "BIC" results in the maximum complexity criterion to be used, the one which selects the network with complexity given by maxComplexity.

The parameters tempStart, tempCoolFact and tempCheckOrders control the Simulated Annealing schedule.

tempStart is the starting temperature of the annealing process.

tempCoolFact is the cooling factor from one temperature step to another. It is a number between 0 and 1, inclusively; For example, if tempStart is the temperature in the first step, tempStart*tempCoolFact will be temperature in the second. tempCheckOrders is the number of proposals to be checked, or with other words, order selections from the current order's neighborhood, at each step before decreasing the temperature.

maxIter is the maximum number of orders to be checked. If for example maxIter is 40 and tempCheckOrders is 4, then 10 temperature decreasing steps will be eventually performed. orderShuffles is a number that controls the extend of the order neighborhoods. Each new proposed order is obtained from the last accepted one by orderShuffles switches of two node indices. stopDiff is a stopping criterion. If at a current temperature, after tempCheckOrders orders being checked, no likelihood improvement of level at least stopDiff is found, then the SA stops and the function exists. Setting this parameter to zero guarantees exhausting all of the maximum allowed maxIter order searches. priorSearch is a result from previous search. This parameters allows a new search to be initiated from the best order found so far. Thus a chain of searches can be constructed with varying parameters providing greater adaptability and user control.

See the vignettes for more details on the algorithm.

See Also

cnSearchOrder, cnSearchSAcluster

Examples

Run this code
cnet <- cnRandomCatnet(numnodes=12, maxParents=3, numCategories=2)
  psamples  <- cnSamples(object=cnet, numsamples=100)
  nets <- cnSearchSA(data=psamples, perturbations=NULL, 
		maxParentSet=2, maxComplexity=36)
  cc <- cnComplexity(object=cnet)
  cnFind(object=nets, complexity=cc)

Run the code above in your browser using DataLab