Learn R Programming

SDR (version 0.7.0.0)

NMEEF_SD: Non-dominated Multi-objective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery (NMEEF-SD)

Description

Perfoms a subgroup discovery task executing the algorithm NMEEF-SD

Usage

NMEEF_SD(paramFile = NULL, training = NULL, test = NULL, output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"), seed = 0, nLabels = 3, nEval = 10000, popLength = 100, mutProb = 0.1, crossProb = 0.6, RulesRep = "can", Obj1 = "CSUP", Obj2 = "CCNF", Obj3 = "null", minCnf = 0.6, reInitCoverage = "yes", porcCob = 0.5, StrictDominance = "yes", targetVariable = NA, targetClass = "null")

Arguments

paramFile
The path of the parameters file. NULL If you want to use training and test keel variables
training
A keel class variable with training data.
test
A keel class variable with training data.
output
character vector with the paths of where store information file, rules file and test quality measures file, respectively.
seed
An integer to set the seed used for generate random numbers.
nLabels
Number of fuzzy labels defined in the datasets.
nEval
An integer for set the maximum number of evaluations in the evolutive process.
popLength
An integer to set the number of individuals in the population.
mutProb
Sets the mutation probability. A number in [0,1].
crossProb
Sets the crossover probability. A number in [0,1].
RulesRep
Representation used in the rules. "can" for canonical rules, "dnf" for DNF rules.
Obj1
Sets the Objective number 1. See Objective values for more information about the possible values.
Obj2
Sets the Objective number 2. See Objective values for more information about the possible values.
Obj3
Sets the Objective number 3. See Objective values for more information about the possible values.
minCnf
Sets the minimum confidence that must have a rule in the Pareto front for being returned. A number in [0,1].
reInitCoverage
Sets if the algorithm must perform the reinitialitation based on coverage when it is needed. A string with "yes" or "no".
porcCob
Sets the maximum percentage of variables that participate in the rules genereted in the reinitialitation based on coverage. A number in [0,1]
StrictDominance
Sets if the comparison between individuals must be done by strict dominance or not. A string with "yes" or "no".
targetVariable
The name or index position of the target variable (or class). It must be a categorical one.
targetClass
A string specifing the value the target variable. null for search for all possible values.

Value

The algorithm shows in the console the following results:
  1. The parameters used in the algorithm
  2. The rules generated.
  3. The quality measures for test of every rule and the global results.Also, the algorithms save those results in the files specified in the output parameter of the algorithm or in the outputData parameter in the parameters file.

How does this algorithm work?

NMEEF-SD is a multiobjetctive genetic algorithm based on a NSGA-II approach. The algorithm first makes a selection based on binary tournament and save the individuals in a offspring population. Then, NMEEF-SD apply the genetic operators over individuals in offspring population For generate the population which participate in the next iteration of the evoluationary process NMEEF-SD calculate the dominance among all individuals (join main population and offspring) and then, apply the NSGA-II fast sort algorithm to order the population by fronts of dominance, the first front is the non-dominanted front (or Pareto), the second is where the individuals dominated by one individual are, the thirt front dominated by two and so on. To promove diversity NMEEF-SD has a mechanism of reinitialization of the population based on coverage if the Pareto doesnt evolve during a 5 At the final of the evolutionary process, the algorithm returns only the individuals in the Pareto front which has a confidence greater than a minimum confidence level.

Parameters file structure

The paramFile argument points to a file which has the necesary parameters for NMEEF-SD works. This file must be, at least, those parameters (separated by a carriage return):
  • algorithm Specify the algorithm to execute. In this case. "NMEEFSD"
  • inputData Specify two paths of KEEL files for training and test. In case of specify only the name of the file, the path will be the working directory.
  • seed Sets the seed for the random number generator
  • nLabels Sets the number of fuzzy labels to create when reading the files
  • nEval Set the maximun number of evaluations of rules for stop the genetic process
  • popLength Sets number of individuals of the main population
  • ReInitCob Sets if NMEEF-SD do the reinitialization based on coverage. Values: "yes" or "no"
  • crossProb Crossover probability of the genetic algorithm. Value in [0,1]
  • mutProb Mutation probability of the genetic algorithm. Value in [0,1]
  • RulesRep Representation of each chromosome of the population. "can" for canonical representation. "dnf" for DNF representation.
  • porcCob Sets the maximum percentage of variables participe in a rule when doing the reinitialization based on coverage. Value in [0,1]
  • Obj1 Sets the objective number 1.
  • Obj2 Sets the objective number 2.
  • Obj3 Sets the objective number 3.
  • minCnf Minimum confidence for returning a rule of the Pareto. Value in [0,1]
  • StrictDominance Sets if the comparison of individuals when calculating dominance must be using strict dominance or not. Values: "yes" or "no"
  • targetClass Value of the target variable to search for subgroups. The target variable is always the last variable.. Use null to search for every value of the target variable
An example of parameter file could be:
algorithm = NMEEFSD
inputData = "irisd-10-1tra.dat" "irisd-10-1tra.dat" "irisD-10-1tst.dat"
outputData = "irisD-10-1-INFO.txt" "irisD-10-1-Rules.txt" "irisD-10-1-TestMeasures.txt"
seed = 1
RulesRep = can
nLabels = 3
nEval = 500
popLength = 51
crossProb = 0.6
mutProb = 0.1
ReInitCob = yes
porcCob = 0.5
Obj1 = comp
Obj2 = unus
Obj3 = null
minCnf = 0.6
StrictDominance = yes
targetClass = Iris-setosa

Objective values

You can use the following quality measures in the ObjX value of the parameter file using this values:
  • Unusualness -> unus
  • Crisp Support -> csup
  • Crisp Confidence -> ccnf
  • Fuzzy Support -> fsup
  • Fuzzy Confidence -> fcnf
  • Coverage -> cove
  • Significance -> sign
If you dont want to use a objetive value you must specify null

Details

This function sets as target variable the last one that appear in the KEEL file. If you want to change the target variable, you can use changeTargetVariable for this objective. The target variable MUST be categorical, if it is not, throws an error.

If you specify in paramFile something distintc to NULL the rest of the parameters are ignored and the algorithm tries to read the file specified. See "Parameters file structure" below if you want to use a parameters file.

References

Carmona, C., Gonzalez, P., del Jesus, M., & Herrera, F. (2010). NMEEF-SD: Non-dominated Multi-objective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery.

Examples

Run this code
NMEEF_SD(paramFile = NULL,
               training = habermanTra,
               test = habermanTst,
               output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"),
               seed = 0,
               nLabels = 3,
               nEval = 300,
               popLength = 100,
               mutProb = 0.1,
               crossProb = 0.6,
               RulesRep = "can",
               Obj1 = "CSUP",
               Obj2 = "CCNF",
               Obj3 = "null",
               minCnf = 0.6,
               reInitCoverage = "yes",
               porcCob = 0.5,
               StrictDominance = "yes",
               targetClass = "positive"
               )
## Not run: 
#       NMEEF_SD(paramFile = NULL,
#                training = habermanTra,
#                test = habermanTst,
#                output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"),
#                seed = 0,
#                nLabels = 3,
#                nEval = 300,
#                popLength = 100,
#                mutProb = 0.1,
#                crossProb = 0.6,
#                RulesRep = "can",
#                Obj1 = "CSUP",
#                Obj2 = "CCNF",
#                Obj3 = "null",
#                minCnf = 0.6,
#                reInitCoverage = "yes",
#                porcCob = 0.5,
#                StrictDominance = "yes",
#                targetClass = "null"
#                )
#      ## End(Not run)

Run the code above in your browser using DataLab