Learn R Programming

⚠️There's a newer version (1.3.3) of this package.Take me there.

DataVisualizations

Table of Contents

1. Introduction
2. Installation
3. Additional Ressources
4. References

1. Introduction

“Exploratory data analysis is detective work” [Tukey, 1977, p.2]. This package enables the user to use graphical tools to find ‘quantitative indications’ enabling a better understanding of the data at hand. “As all detective stories remind us, many of the circumstances surrounding a crime are accidental or misleading. Equally, many of the indications to be discerned in bodies of data are accidental or misleading [Tukey, 1977, p.3].” The solution is to compare many different graphical tools with the goal to find an agreement or to generate an hypothesis and then to confirm it with statistical methods. This package serves as a starting point.

The DataVisualizations package offers various visualization methods and graphical tools for data analysis, including:

  • Synoptic visualizations of data: Synoptic visualization methods such as Pixelmatrices.
  • Distribution analysis and visualization: Visual distribution analysis for one- or higher dimensional data, including MD Plots and PDE (Pareto Density Estimation).
  • Spatial visualizations: Spatial visualizations such as choropleth maps.
  • Visual analysis of Clusters, Correlation, Distances and Projections: Visual analysis of clusters such as Silhouette plots, or visual projection analysis with the Shepard diagrams.
  • Other visualizations: For example ABC-Barplots, Errorplots and more.

Examples of synoptic visualizations:

Get synoptic view of the data, with a pixelmatrix

data("Lsun3D")
Pixelmatrix(Lsun3D$Data)

The Pixelmatrix can be used as a shortcut in visualizing correlations between many variables

n=nrow(Lsun3D$Data)
Data=cbind(Lsun3D$Data,runif(n),rnorm(n),rt(n,2),rlnorm(n),rchisq(100,2))
Header=c('x','y','z','uniform','gauss','t','log-normal','chi')
cc=cor(Data,method='spearman')
diag(cc)=0
Pixelmatrix(cc,YNames = Header,XNames = Header,main = 'Spearman Coeffs')

Examples of distribution analysis:

InspectVariables provides a summary of the most important plots for one dimensional distribution analysis such as histogram, continuous data density estimation, QQ-Plot, and Boxplot:

data(ITS)
InspectVariable(ITS)

The MD Plot can be used for visualizing the densities of several variables, the MD Plot combines the syntax of ggplot2 with the Pareto density estimation and additional functionality useful from the Data Scientist’s point of view:

data(MTY)
Data=cbind(ITS,MTY)
MDplot(Data)+ylim(0,6000)+ggtitle('Two Features with MTY Capped')

Create density scatter plots in 2D:

DensityScatter(ITS, MTY, xlab = 'ITS in EUR', ylab ='MTY in EUR', xlim = c(0,1200), ylim = c(0,15000), main='Pareto Density Estimation indicates Bimodality')

Examples of visual cluster analysis:

The heatmap of the distances, ordered by clusters allows to get a synoptic view over the intra- and intercluster distances. Examples and interpretations of Heatmaps and Silhouette plots are presented in [Thrun 2018A, 2018B].

data("Lsun3D")
Heatmap(Lsun3D$Data,Lsun3D$Cls,method = 'euclidean')

Plot Silhuoette plot of clustering:

Silhouetteplot(Lsun3D$Data,Lsun3D$Cls,PlotIt = T)

InputDistances shows the most important plots of the distribution of distances of the data. The distance distribution in the input space can be bimodal, indicating a distinction between the inter- versus intracluster distances. This can serve as an indication of distance-based cluster structures (see [Thrun, 2018A, 2018B]).

InspectDistances(Lsun3D$Data,method="euclidean")

2. Installation

Installation using CRAN

Install automatically with all dependencies via

install.packages("DataVisualizations",dependencies = T)

Installation using Github

Please note, that dependecies have to be installed manually.

remotes::install_github("Mthrun/DataVisualizations")

Installation using R Studio

Please note, that dependecies have to be installed manually.

Tools -> Install Packages -> Repository (CRAN) -> DataVisualizations

Tutorial Examples

The tutorial with several examples can be found on in the vignette on CRAN.

4. References

[Thrun, 2018A] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.

[Thrun, 2018B] Thrun, M. C.: Cluster Analysis of Per Capita Gross Domestic Products, Entrepreneurial Business and Economics Review (EBER), Vol. 7(1), pp. 217-231, https://doi.org/10.15678/EBER.2019.070113, 2019.

[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.

[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.

[Tukey, 1977] Tukey, J. W.: Exploratory data analysis, United States Addison-Wesley Publishing Company, ISBN: 0-201-07616-0, 1977.

Copy Link

Version

Install

install.packages('DataVisualizations')

Monthly Downloads

634

Version

1.3.2

License

GPL-3

Maintainer

Michael Thrun

Last Published

October 10th, 2023

Functions in DataVisualizations (1.3.2)

ClassBoxplot

Creates Boxplot plot for all classes
BimodalityAmplitude

Bimodality Amplitude
ClassPDEplotMaxLikeli

Create PDE plot for all classes with maximum likelihood
CCDFplot

plot Complementary Cumulative Distribution Function (CCDF) in Log/Log uses ecdf, CCDF(x) = 1-cdf(x)
ClassMDplot

Class MDplot for Data w.r.t. all classes
DualaxisClassplot

Dualaxis Classplot
ClassPDEplot

PDE Plot for all classes
ClassErrorbar

ClassErrorbar
DataVisualizations-package

tools:::Rd_package_title("DataVisualizations")
CombineCols

Combine vectors of various lengths
Crosstable

Crosstable plot
GoogleMapsCoordinates

Google Maps with marked coordinates
DensityContour

Contour plot of densities
Classplot

Classplot
InspectStandardization

QQplot of Data versus Normalized Data
Fanplot

The fan plot
Heatmap

Heatmap for Clustering
InspectVariable

Visualization of Distribution of one variable
Multiplot

Plot multiple ggplots objects in one panel
DefaultColorSequence

Default color sequence for plots
DensityScatter

Scatter plot with densities
HeatmapColors

Default color sequence for plots
FundamentalData_Q1_2018

Fundamental Data of the 1st Quarter in 2018
JitterUniqueValues

Jitters Unique Values
ITS

Income Tax Share
ShepardDensityscatter

Shepard PDE scatter
OptimalNoBins

Optimal Number Of Bins
InspectBoxplots

Inspect Boxplots
InspectCorrelation

Inspect the Correlation
MDplot4multiplevectors

Mirrored Density plot (MD-plot)for Multiple Vectors
Lsun3D

Lsun3D inspired by FCPS [Thrun/Ultsch, 2020] introduced in [Thrun, 2018]
PlotProductratio

Product-Ratio Plot
MTY

Muncipal Income Tax Yield
MAplot

Minus versus Add plot
Pixelmatrix

Plot of a Pixel Matrix
PmatrixColormap

P-Matrix colors
DualaxisLinechart

DualaxisLinechart
Plot3D

3D plot of points
world_country_polygons

world_country_polygons
zplot

Plotting for 3 dimensional data
PDEplot

PDE plot
MDplot

Mirrored Density plot (MD-plot)
Slopechart

Slope Chart
ParetoRadius

ParetoRadius for distributions
Piechart

The pie chart
InspectDistances

Inspection of Distance-Distribution
ParetoDensityEstimation

Pareto Density Estimation V3
SignedLog

Signed Log
Sheparddiagram

Draws a Shepard Diagram
RobustNorm_BackTrafo

Transforms the Robust Normalization back
RobustNormalization

RobustNormalization
StatPDEdensity

Pareto Density Estimation
Silhouetteplot

Silhouette plot of classified data.
Worldmap

plots a world map by country codes
categoricalVariable

A categorical Feature.
InspectScatterplots

Pairwise scatterplots and optimal histograms
estimateDensity2D

estimateDensity2D
stat_pde_density

Calculate Pareto density estimation for ggplot2 plots
PlotGraph2D

PlotGraph2D
PlotMissingvalues

Plot of the Amount Of Missing Values
QQplot

QQplot with a Linear Fit
ROC

ROC plot
ChoroplethPostalCodesAndAGS_Germany

Postal Codes and AGS of Germany for a Choropleth Map
Choroplethmap

Plots the Choropleth Map
ABCbarplot

Barplot with Sorted Data Colored by ABCanalysis
AccountingInformation_PrimeStandard_Q3_2019

Accounting Information in the Prime Standard in Q3 in 2019 (AI_PS_Q3_2019)