Learn R Programming

DataVisualizations (version 1.2.3)

PDEscatter: Scatter Density Plot

Description

Pareto density estimation (PDE) [Ultsch, 2005] used for a scatter density plot.

Usage

PDEscatter(x,y,SampleSize,

na.rm=FALSE,PlotIt=TRUE,ParetoRadius,sampleParetoRadius, NrOfContourLines=20,Plotter='native', DrawTopView = TRUE, xlab="X", ylab="Y", main="PDEscatter", xlim, ylim, Legendlab_ggplot="value")

Value

List of:

X

Numeric vector [1:m],m<=n, first feature used in the plot or the kernels used

Y

Numeric vector [1:m],m<=n, second feature used in the plot or the kernels used

Densities

Numeric vector [1:m],m<=n, Number of points within the ParetoRadius of each point, i.e. density information

Matrix3D

1:n,1:3] marix of x,y and density information

ParetoRadius

ParetoRadius used for PDEscatter

Handle

Handle of the plot object. Information-string if native R plot is used.

Arguments

x

Numeric vector [1:n], first feature (for x axis values)

y

Numeric vector [1:n], second feature (for y axis values)

SampleSize

Numeric m, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn

na.rm

Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE

ParetoRadius

Numeric, positiv scalar, the Pareto Radius. If omitted (or 0), calculate by paretoRad.

sampleParetoRadius

Numeric, positiv scalar, maximum size of the sample used for estimation of "kernel", should be significantly lower than SampleSize because requires distance computations which is memory expensive

PlotIt

TRUE: plots with function call

FALSE: Does not plot, plotting can be done using the list element Handle

-1: Computes density only, does not perfom any preperation for plotting meaning that Handle=NULL

NrOfContourLines

Numeric, number of contour lines to be drawn. 20 by default.

Plotter

String, name of the plotting backend to use. Possible values are: "native", "ggplot", "plotly"

DrawTopView

Boolean, True means contur is drawn, otherwise a 3D plot is drawn. Default: TRUE

xlab

String, title of the x axis. Default: "X", see plot() function

ylab

String, title of the y axis. Default: "Y", see plot() function

main

string, the same as "main" in plot() function

xlim

see plot() function

ylim

see plot() function

Legendlab_ggplot

String, in case of Plotter="ggplot" label for the legend. Default: "value"

Author

Felix Pape

Details

The PDEscatter function generates the density of the xy data as a z coordinate. Afterwards xyz will be plotted either as a contour plot or a 3d plot. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y) operation is allowed. This function plots the PDE on top of a scatterplot. Variances of x and y should not differ by extreme numbers, otherwise calculate the percentiles on both first. If DrawTopView=FALSE only the plotly option is currently available. If another option is chosen, the method switches automatically there.

The method was succesfully used in [Thrun, 2018; Thrun/Ultsch 2018].

PlotIt=FALSE is usefull if one likes to perform adjustements like axis scaling prior to plotting with ggplot2 or plotly. In the case of "native"" the handle returns NULL because the basic R functon plot() is used

References

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.

[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.

[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.

Examples

Run this code
#taken from [Thrun/Ultsch, 2018]
data("ITS")
data("MTY")
Inds=which(ITS<900&MTY<8000)
plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot')
# \donttest{
PDEscatter(ITS[Inds],MTY[Inds],xlab = 'ITS in EUR',

ylab ='MTY in EUR' ,main='Pareto Density Estimation indicates Bimodality' )
# }
# \dontshow{

PDEscatter(ITS[1:10],MTY[1:10],xlab = 'ITS in EUR',

ylab ='MTY in EUR' ,main='Pareto Density Estimation indicates Bimodality' )
# }

Run the code above in your browser using DataLab