Learn R Programming

⚠️There's a newer version (1.0.8) of this package.Take me there.

HistDAWass

(Histogram-valued Data analysis using Wasserstein metric)

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

In this document we describe the main features of the HistDAWass package. The name is the acronym for Histogram-valued Data analysis using Wasserstein metric. The implemented classes and functions are related to the anlysis of data tables containing histograms in each cell instead of the classical numeric values.

What is the L2 Wasserstein metric?

given two probability density functions f and g, each one has a cumulative distribution function F and G and thei respectively quantile functions (the inverse of a cumulative distribution function) Qf and Qg. The L2 Wasserstein distance is

$$d_W(f,g)=\sqrt{\int\limits_0^1{(Q_f(p) - Q_g(p))^2 dp}}$$

The implemented classes are those described in the following table

library(HistDAWass)
mydist=distributionH(x=c(0,1,2),p=c(0,0.3,1))

From raw data to histograms

data2hist functions

Basic statistics for a distributionH (A histogram)

  • mean
    • the mean of a histogram
  • standard deviation
    • the standard deviation of a histogram
  • skewness
    • the third standardized moment of a histogram
  • kurthosis
    • the fourth standardized momemt of a histogram

Basic statistics for a MatH (A matrix of histogrm-valued data)

  • The average hisogram of a column

    • It is an average histogram that minimizes the sum of squared Wasserstein distances.
  • The standard deviation of a variable

    • It is a number that measures the dispersion of a set of histograms.
  • The covarince matrix of a MatH

    • It is a matrix that measures the covariances into a set of hitogram variables.
  • The correlation matrix of a MatH

    • It is a matrix that measures the correlation into a set of hitogram variables.

Visualization

plot of a distributionH

plot of a MatH

plot of a HTS

Data Analysis methods

Clustering

  • Kmeans

  • Adaptive distance based Kmeans

  • Fuzzy cmeans

  • Fuzzy cmeans based on adaptive Wasserstein distances

  • Kohonen batch self organizing maps

  • Kohonen batch self organizing maps with Wasserstein adaptive distances

  • Hierarchical clustering

Dimension reduction techniques

  • Principal components analysis of a single histogram variable

  • Principal components analysis of a set of histogram variables (using Multiple Factor Analysis)

Methods for Histogram time series

Smoothing

  • Moving averages

  • Exponential smoothing

Predicting

  • KNN prediction of histogram time series

Linear regression

A two component model for a linear regression using Least Square method

Copy Link

Version

Install

install.packages('HistDAWass')

Monthly Downloads

258

Version

1.0.3

License

GPL (>= 2)

Maintainer

Antonio Irpino

Last Published

June 5th, 2019

Functions in HistDAWass (1.0.3)

China_Month

A monthly climatic dataset of China
HTS-class

Class HTS
BLOOD

Blood dataset for Histogram data analysis
Age_Pyramids_2014

Age pyramids of all the countries of the World in 2014
China_Seas

A seasonal climatic dataset of China
BloodBRITO

Blood dataset from Brito P. for Histogram data analysis
Center.cell.MatH

Method Center.cell.MatH Centers all the cells of a matrix of distributions
DouglasPeucker

Ramer-Douglas-Peucker algorithm for curve fitting with a PolyLine
HTS.exponential.smoothing

Smoothing with exponential smoothing of a histogram time series
Agronomique

Agronomique data
OzoneFull

Full Ozone dataset for Histogram data analysis
TMatH-class

Class TMatH
TdistributionH-class

Class TdistributionH
WH.bind.row

Method WH.bind.row
RetHTS

A histogram-valued dataset of returns
WH.1d.PCA

Principal components analysis of histogram variable based on Wasserstein distance
WH.correlation

Method WH.correlation
ShortestDistance

Shortes distance from a point o a 2d segment
WH.MultiplePCA

Principal components analysis of a set of histogram variable based on Wasserstein distance
HistDAWass-package

Histogram-Valued Data Analysis
MatH-class

Class MatH.
WH.mat.prod

Method WH.mat.prod
WH.correlation2

Method WH.correlation2
OzoneH

Complete Ozone dataset for Histogram data analysis
WH.var.covar

Method WH.var.covar
WH.var.covar2

Method WH.var.covar2
WH_kmeans

K-means of a dataset of histogram-valued data
WassSqDistH

Method WassSqDistH
WH.bind

Method WH.bind
WH.bind.col

Method WH.bind.col
WH.regression.two.components

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
WH.vec.sum

Method WH.vec.sum
WH_adaptive.kmeans

K-means of a dataset of histogram-valued data using adaptive Wasserstein distances
WH.vec.mean

Method WH.vec.mean
dotpW

Method dotpW
[

extract from a MatH Method [
WH_adaptive_fcmeans

Fuzzy c-means with adaptive distances for histogram-valued data
crwtransform

Method crwtransform: returns the centers and the radii of bins of a distribution
compQ

Method compQ
HTS.moving.averages

Smoothing with moving averages of a histogram time series
kurtH

Method kurtH: computes the kurthosis of a distribution
meanH

Method meanH: computes the mean of a distribution
rQQ

Method rQQ
register

Method register
*-methods

Method *
get.s

Method get.s: the standard deviation of a distribution
HTS.predict.knn

K-NN predictions of a histogram time series
WH.SSQ

Method WH.SSQ
WH.SSQ2

Method WH.SSQ2
WH.mat.sum

Method WH.mat.sum
WH.plot_multiple_Spanish.funs

Plotting Spanish fun plots for Multiple factor analysis of Histogram Variables
WH_2d_Kohonen_maps

Batch Kohonen self-organizing 2d maps for histogram-valued data
distributionH-class

Class distributionH.
data2hist

From real data to distributionH.
get.histo

Method get.histo: show the distribution with bins
WH_2d_Adaptive_Kohonen_maps

Batch Kohonen self-organizing 2d maps using adaptive distances for histogram-valued data
get.m

Method get.m: the mean of a distribution
is.registeredMH

Method is.registeredMH
minus

Method -
WH.plot_multiple_indivs

Plot histograms of individuals after a Multiple factor analysis of Histogram Variables
WH.regression.GOF

Goodness of Fit indices for Multiple regression of histogram variables based on a two component model and L2 Wasserstein distance
get.MatH.varnames

Method get.MatH.varnames
get.MatH.main.info

Method get.MatH.main.info
compP

Method compP
checkEmptyBins

Method checkEmptyBins
get.MatH.ncols

Method get.MatH.ncols
get.MatH.stats

Method get.MatH.stats
plot-HTS

Method plot for a histogram time series
WH.regression.two.components.predict

Multiple regression analysis for histogram variables based on a two component model and L2 Wasserstein distance
registerMH

Method registerMH
plot-MatH

Method plot for a matrix of histograms
plot-TdistributionH

plot for a TdistributionH object
show

Method show for distributionH
show-MatH

Method show for MatH
WH_fcmeans

Fuzzy c-means of a dataset of histogram-valued data
get.cell.MatH

Method get.cell.MatH Returns the histogram in a cell of a matrix of distributions
get.distr

Method get.distr: show the distribution
get.MatH.rownames

Method get.MatH.rownames
WH_hclust

Hierarchical clustering of histogram data
get.MatH.nrows

Method get.MatH.nrows
plot-distributionH

plot for a distributionH object
plotPredVsObs

A function for comparing observed vs predicted histograms
set.cell.MatH

Method set.cell.MatH assign a histogram to a cell of a matrix of histograms
stdH

Method stdH: computes the standard deviation of a distribution
plot_errors

A function for plotting functions of errors
subsetHTS

Method subsetHTS: extract a subset of a histogram time series
+

Method +
stations_coordinates

Stations coordinates of China_Month and China_Seas datasets
skewH

Method skewH: computes the skewness of a distribution