Learn R Programming

visualpred (version 0.1.2)

famdcontour: Contour plots and FAMD function for classification modeling

Description

This function presents visual graphics by means of FAMD. FAMD function is Factorial Analysis for Mixed Data (interval and categorical) Dependent classification variable is set as supplementary variable. Machine learning algorithm predictions are presented in a filled contour setting

Usage

famdcontour(dataf=dataf,listconti,listclass,vardep,proba="",
title="",title2="",depcol="",listacol="",alpha1=0.7,alpha2=0.7,alpha3=0.7,
classvar=1,intergrid=0,selec=0,
modelo="glm",nodos=3,maxit=200,decay=0.01,
sampsize=400,mtry=2,nodesize=10,ntree=400,
ntreegbm=500,shrink=0.01,
bag.fraction=1,n.minobsinnode=10,C=100,
gamma=10,Dime1="Dim.1",Dime2="Dim.2",
offsetx=0.1,offsety=0.1)

Value

A list with the following objects:

graph1

plot of points on FAMD first two dimensions

graph2

plot of points and contour curves

graph3

plot of points and variables

graph4

plot of points variable and contour curves

graph5

plot of points colored by fitted probability

graph6

plot of points colored by abs difference

df1

data frame used for graph1

df2

data frame used for contour curves

df3

data frame used for variable names

df4

data frame for use in famdcontourlabel

listconti

interval variables used-selected

listclass

class variables used-selected

#'

...

color schemes and other parameters

Arguments

dataf

data frame.

listconti

Interval variables to use, in format c("var1","var2",...).

listclass

Class variables to use, in format c("var1","var2",...).

vardep

Dependent binary classification variable.

proba

vector of probability predictions obtained externally (optional)

title

plot main title

title2

plot subtitle

depcol

vector of two colors for points

listacol

vector of colors for labels

alpha1

alpha transparency for majoritary class

alpha2

alpha transparency for minoritary class

alpha3

alpha transparency for fit probability plots

classvar

1 if dependent variable categories are plotted as supplementary

intergrid

scale of grid for contour:0 if automatic

selec

1 if stepwise logistic variable selection is required, 0 if not.

modelo

name of model: "glm","gbm","rf,","nnet","svm".

nodos

nnet: nodes

maxit

nnet: iterations

decay

nnet: decay

sampsize

rf: sampsize

mtry

rf: mtry

nodesize

rf: nodesize

ntree

rf: ntree

ntreegbm

gbm: ntree

shrink

gbm: shrink

bag.fraction

gbm: bag.fraction

n.minobsinnode

gbm:n.minobsinnode

C

svm Radial: C

gamma

svm Radial: gamma

Dime1, Dime2

FAMD Dimensions to consider. Dim.1 and Dim.2 by default.

offsetx

margin control for contour in percent of rangex, default=0.1

offsety

margin control for contour in percent of rangey, default=0.1

Details

FAMD algorithm from FactoMineR package is used to compute point coordinates on dimensions (Dim.1 and Dim.2 by default). Minority class on dependent variable category is represented as red, majority category as green. Color scheme can be altered using depcol and listacol, as well as alpha transparency values.

Predictive modeling

Logistic regression (glm) is used as default predictive model. Algorithms nnet, rf,gbm and svm-RBF can be used with basic parameter setting. A vector of fitted probabilities obtained externally from other algorithms can also be imported in parameter proba=nameofvector. Contour curves are in this particular case computed based on this vector. Before perfroming predictive modeling, selec=1 can be used to select variables with a stepwise BIC logistic regression. By default select=0 (all variables input are used). It is recommended to make variable selection process before using famdcontour, and use only useful variables as input.

Contour curves

Contour curves are build by the following process: i) the chosen algorithm model is trained and all observations are predicted-fitted. ii) A grid of points on the two chosen FAMD dimensions is built iii) package MBA is used to interpol probability estimates over the grid, based on previously fitted observations.

Variable representation

In order to represent interval variables, categories of class variables, and points in the same plot, a proportional projection of interval variables coordinates over the two dimensions range is applied. Since space of input variables is frequently larger than two dimensions, sometimes overlapping of points is produced; a frequency variable is used, and alpha values may be adjusted to avoid wrong interpretations of the presence of dependent variable category/color.

Troubleshooting

  • Check missings. Missing values are not allowed.

  • By default selec=0. Setting selec=1 may sometimes imply that no variables are selected; an error message is shown in this case.

  • Models with only two input variables could lead to plot generation problems.

  • Be sure that variables named in listconti are all numeric.

  • If some numeric variable is constant at one single value, process is stopped since numeric Min-max standarization is performed, and NaN values are generated.

  • Dependent variable can not be named x,y,z,x1,x2.

  • When there are only categorical variables as input use mcacontour instead

References

Pages J. (2004). Analyse factorielle de donnees mixtes. Revue Statistique Appliquee. LII (4). pp. 93-111.

Examples

Run this code
data(breastwisconsin1)
dataf<-breastwisconsin1
listconti=c( "clump_thickness","uniformity_of_cell_shape","mitosis")
listclass=c("")
vardep="classes"
result<-famdcontour(dataf=dataf,listconti,listclass,vardep)

Run the code above in your browser using DataLab