Learn R Programming

hermiter

What does hermiter do?

hermiter is an R package that facilitates the estimation of the probability density function and cumulative distribution function in univariate and bivariate settings using Hermite series based estimators. In addition, hermiter allows the estimation of the quantile function in the univariate case and nonparametric correlation coefficients in the bivariate case. The package is applicable to streaming, batch and grouped data. The core methods of the package are written in C++ for speed.

These estimators are particularly useful in the sequential setting (both stationary and non-stationary data streams). In addition, they are useful in efficient, one-pass batch estimation which is particularly relevant in the context of large data sets. Finally, the Hermite series based estimators are applicable in decentralized (distributed) settings in that estimators formed on subsets of the data can be consistently merged. The Hermite series based estimators have the distinct advantage of being able to estimate the full density function, distribution function and quantile function (univariate setting) along with the Spearman Rho and Kendall Tau correlation coefficients (bivariate setting) in an online manner. The theoretical and empirical properties of most of these estimators have been studied in-depth in the articles below. The investigations demonstrate that the Hermite series based estimators are particularly effective in distribution function, quantile function and Spearman correlation estimation.

A summary of the estimators and algorithms in hermiter can be found in the article below.

Features

Univariate

  • fast batch estimation of pdf, cdf and quantile function
  • consistent merging of estimates
  • fast sequential estimation of pdf, cdf and quantile function on streaming data
  • adaptive sequential estimation on non-stationary streams via exponential

weighting

  • provides online, O(1) time complexity estimates of arbitrary quantiles e.g.

median at any point in time along with probability densities and cumulative probabilities at arbitrary x

  • uses small and constant memory for the estimator
  • provides a very compact, simultaneous representation of the pdf, cdf and

quantile function that can be efficiently stored and communicated using e.g. saveRDS and readRDS functions

Bivariate

  • fast batch estimation of bivariate pdf, cdf and nonparametric correlation

coefficients (Spearman Rho and Kendall Tau)

  • consistent merging of estimates
  • fast sequential estimation of bivariate pdf, cdf and nonparametric correlation

coefficients on streaming data

  • adaptive sequential estimation on non-stationary bivariate streams via

exponential weighting

  • provides online, O(1) time complexity estimates of bivariate probability

densities and cumulative probabilities at arbitrary points, x

  • provides online, O(1) time complexity estimates of the Spearman and Kendall

rank correlation coefficients

  • uses small and constant memory for the estimator

Installation

The release version of hermiter can be installed from CRAN with:

install.packages("hermiter")

The development version of hermiter can be installed using devtools with:

devtools::install_github("MikeJaredS/hermiter")

Load Package

In order to utilize the hermiter package, the package must be loaded using the following command:

library(hermiter)

Construct Estimator

A hermite_estimator S3 object is constructed as below. The argument, N, adjusts the number of terms in the Hermite series based estimator and controls the trade-off between bias and variance. A lower N value implies a higher bias but lower variance and vice versa for higher values of N. The argument, standardize, controls whether or not to standardize observations before applying the estimator. Standardization usually yields better results and is recommended for most estimation settings.

A univariate estimator is constructed as follows (note that the default estimator type is univariate, so this argument does not need to be explicitly set):

hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "univariate")

Similarly for constructing a bivariate estimator:

hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")

Batch Estimator Updating

A hermite_estimator object can be initialized with a batch of observations as below.

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, observations = 
                                   observations)

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", observations = 
                                   observations)

Sequential Estimator Updating

In the sequential setting, observations are revealed one at a time. A hermite_estimator object can be updated sequentially with a single new observation by utilizing the update_sequential method. Note that when updating the Hermite series based estimator sequentially, observations are also standardized sequentially if the standardize argument is set to true in the constructor.

Standard syntax

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE)
for (idx in seq_along(observations)) {
  hermite_est <- update_sequential(hermite_est,observations[idx])
}

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")
for (idx in seq_len(nrow(observations))) {
  hermite_est <- update_sequential(hermite_est,observations[idx,])
}

Piped syntax

For univariate observations:

observations <- rlogis(n=1000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE)
for (idx in seq_along(observations)) {
  hermite_est <- hermite_est %>% update_sequential(observations[idx])
}

For bivariate observations:

observations <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate")
for (idx in seq_len(nrow(observations))) {
  hermite_est <- hermite_est %>% update_sequential(observations[idx,])
}

Merging Hermite Estimators

Hermite series based estimators can be consistently combined/merged in both the univariate and bivariate settings. In particular, when standardize = FALSE, the results obtained from combining/merging distinct hermite_estimators updated on subsets of a data set are exactly equal to those obtained by constructing a single hermite_estimator and updating on the full data set (corresponding to the concatenation of the aforementioned subsets). This holds true for the pdf, cdf and quantile results in the univariate case and the pdf, cdf and nonparametric correlation results in the bivariate case. When standardize = TRUE, the equivalence is no longer exact, but is accurate enough to be practically useful. Combining/merging hermite_estimators is illustrated below.

For the univariate case:

observations_1 <- rlogis(n=1000)
observations_2 <- rlogis(n=1000)
hermite_est_1 <- hermite_estimator(N=10, standardize=TRUE, 
                                   observations = observations_1)
hermite_est_2 <- hermite_estimator(N=10, standardize=TRUE, 
                                   observations = observations_2)
hermite_est_merged <- merge_hermite(list(hermite_est_1,hermite_est_2))

For the bivariate case:

observations_1 <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
observations_2 <- matrix(data = rnorm(2000),nrow = 1000, ncol=2)
hermite_est_1 <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_1)
hermite_est_2 <- hermite_estimator(N=10, standardize=TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_2)
hermite_est_merged <- merge_hermite(list(hermite_est_1,hermite_est_2))

The ability to combine/merge estimators is particularly useful in applications involving grouped data (see package vignette).

Estimate univariate pdf, cdf and quantile function

The central advantage of Hermite series based estimators is that they can be updated in a sequential/one-pass manner as above and subsequently probability densities and cumulative probabilities at arbitrary x values can be obtained, along with arbitrary quantiles. The hermite_estimator object only maintains a small and fixed number of coefficients and thus uses minimal memory. The syntax to calculate probability densities, cumulative probabilities and quantiles in the univariate setting is presented below.

Standard syntax

observations <- rlogis(n=2000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 observations = observations)

x <- seq(-15,15,0.1)
pdf_est <- dens(hermite_est,x)
cdf_est <- cum_prob(hermite_est,x)

p <- seq(0.05,1,0.05)
quantile_est <- quant(hermite_est,p)

Piped syntax

observations <- rlogis(n=2000)
hermite_est <- hermite_estimator(N=10, standardize=TRUE, 
                                 observations = observations)

x <- seq(-15,15,0.1)
pdf_est <- hermite_est %>% dens(x)
cdf_est <- hermite_est %>% cum_prob(x)

p <- seq(0.05,0.95,0.05)
quantile_est <- hermite_est %>% quant(p)
actual_pdf <- dlogis(x)
actual_cdf <- plogis(x)
df_pdf_cdf <- data.frame(x,pdf_est,cdf_est,actual_pdf,actual_cdf)

actual_quantiles <- qlogis(p)
df_quant <- data.frame(p,quantile_est,actual_quantiles)

Comparing Estimated versus Actual

ggplot(df_pdf_cdf,aes(x=x)) + geom_line(aes(y=pdf_est, colour="Estimated")) +
  geom_line(aes(y=actual_pdf, colour="Actual")) +
  scale_colour_manual("", 
                      breaks = c("Estimated", "Actual"),
                      values = c("blue", "black")) + ylab("Probability Density")

ggplot(df_pdf_cdf,aes(x=x)) + geom_line(aes(y=cdf_est, colour="Estimated")) +
  geom_line(aes(y=actual_cdf, colour="Actual")) +
  scale_colour_manual("", 
                      breaks = c("Estimated", "Actual"),
                      values = c("blue", "black")) +
  ylab("Cumulative Probability")

ggplot(df_quant,aes(x=actual_quantiles)) + geom_point(aes(y=quantile_est),
                                                      color="blue") +
  geom_abline(slope=1,intercept = 0) +xlab("Theoretical Quantiles") +
  ylab("Estimated Quantiles")

Convenience functions

Note that there are also generic methods facilitating summarizing and plotting univariate densities and distribution functions as illustrated below.

h_dens <- density(hermite_est)
print(h_dens)
plot(h_dens)

h_cdf <- hcdf(hermite_est)
print(h_cdf)
plot(h_cdf)
summary(h_cdf)

Finally there are the following convenience functions providing familiar syntax to the ordinary R functions.

quantile(hermite_est)

median(hermite_est)

IQR(hermite_est)

Estimate bivariate pdf, cdf and nonparametric correlation

The aforementioned suitability of Hermite series based estimators in sequential and one-pass batch estimation settings extends to the bivariate case. Probability densities and cumulative probabilities can be obtained at arbitrary points. The syntax to calculate probability densities and
cumulative probabilities along with the Spearman and Kendall correlation coefficients in the bivariate setting is presented below.

Standard syntax

# Prepare bivariate normal data
sig_x <- 1
sig_y <- 1
num_obs <- 4000
rho <- 0.5
observations_mat <- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2),
          sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), 
          nrow=2,ncol=2, byrow = TRUE))

hermite_est <- hermite_estimator(N = 30, standardize = TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_mat) 
vals <- seq(-5,5,by=0.25)
x_grid <- as.matrix(expand.grid(X=vals, Y=vals))
pdf_est <- dens(hermite_est,x_grid)
cdf_est <- cum_prob(hermite_est,x_grid)
spear_est <- spearmans(hermite_est)
kendall_est <- kendall(hermite_est)

Piped syntax

sig_x <- 1
sig_y <- 1
num_obs <- 4000
rho <- 0.5
observations_mat <- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2),
        sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), 
          nrow=2, ncol=2, byrow = TRUE))

hermite_est <- hermite_estimator(N = 30, standardize = TRUE, 
                                 est_type = "bivariate", 
                                 observations = observations_mat) 

vals <- seq(-5,5,by=0.25)
x_grid <- as.matrix(expand.grid(X=vals, Y=vals))
pdf_est <- hermite_est %>% dens(x_grid, clipped = TRUE)
cdf_est <- hermite_est %>% cum_prob(x_grid, clipped = TRUE)
spear_est <- hermite_est %>% spearmans()
kendall_est <- hermite_est %>% kendall()
actual_pdf <-mvtnorm::dmvnorm(x_grid,mean=rep(0,2),
            sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), 
                           nrow=2,ncol=2, byrow = TRUE))
actual_cdf <- rep(NA,nrow(x_grid))
for (row_idx in seq_len(nrow(x_grid))) {
  actual_cdf[row_idx] <-  mvtnorm::pmvnorm(lower = c(-Inf,-Inf),
    upper=as.numeric(x_grid[row_idx,]),mean=rep(0,2),sigma = matrix(c(sig_x^2, 
        rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), nrow=2,ncol=2,byrow = TRUE))
}
actual_spearmans <- cor(observations_mat,method = "spearman")[1,2]
actual_kendall <- cor(observations_mat,method = "kendall")[1,2]
df_pdf_cdf <- data.frame(x_grid,pdf_est,cdf_est,actual_pdf,actual_cdf)

Comparing Estimated versus Actual

p1 <- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= actual_pdf)) +
  scale_fill_continuous_sequential(palette="Oslo",
                                   breaks=seq(0,.2,by=.05),
                                   limits=c(0,.2))

p2 <- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= pdf_est)) +
  scale_fill_continuous_sequential(palette="Oslo",
                                   breaks=seq(0,.2,by=.05),
                                   limits=c(0,.2))

p1+ ggtitle("Actual PDF")+ theme(legend.title = element_blank()) + p2 +
  ggtitle("Estimated PDF") +theme(legend.title = element_blank()) +
  plot_layout(guides = 'collect')

p1 <- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= actual_cdf)) +
  scale_fill_continuous_sequential(palette="Oslo",
                       breaks=seq(0,1,by=.2),
                       limits=c(0,1))

p2 <- ggplot(df_pdf_cdf) + geom_tile(aes(X, Y, fill= cdf_est)) +
  scale_fill_continuous_sequential(palette="Oslo",
                                   breaks=seq(0,1,by=.2),
                                   limits=c(0,1))

p1+ ggtitle("Actual CDF") + theme(legend.title = element_blank()) + p2 +
  ggtitle("Estimated CDF") + theme(legend.title = element_blank())+
  plot_layout(guides = 'collect')

Spearman's correlation coefficient results:

Spearman's Correlation
Actual0.453
Estimated0.447

Kendall correlation coefficient results:

Kendall Correlation
Actual0.312
Estimated0.308

Applying to stationary data (sequential setting)

Univariate Example

Another useful application of the hermite_estimator class is to obtain pdf, cdf and quantile function estimates on streaming data. The speed of estimation allows the pdf, cdf and quantile functions to be estimated in real time. We illustrate this below for cdf and quantile estimation with a sample Shiny application. We reiterate that the particular usefulness is that the full pdf, cdf and quantile functions are updated in real time. Thus, any arbitrary quantile can be evaluated at any point in time. We include a stub for reading streaming data that generates micro-batches of standard exponential i.i.d. random data. This stub can easily be swapped out for a method reading micro-batches from a Kafka topic or similar.

The Shiny sample code below can be pasted into a single app.R file and run directly.

# Not Run. Copy and paste into app.R and run.
library(shiny)
library(hermiter)
library(ggplot2)
library(magrittr)

ui <- fluidPage(
    titlePanel("Streaming Statistics Analysis Example: Exponential 
               i.i.d. stream"),
    sidebarLayout(
        sidebarPanel(
            sliderInput("percentile", "Percentile:",
                        min = 0.01, max = 0.99,
                        value = 0.5, step = 0.01)
        ),
        mainPanel(
           plotOutput("plot"),
           textOutput("quantile_text")
        )
    )
)

server <- function(input, output) {
    values <- reactiveValues(hermite_est = 
                                 hermite_estimator(N = 10, standardize = TRUE))
    x <- seq(-15, 15, 0.1)
    # Note that the stub below could be replaced with code that reads streaming 
    # data from various sources, Kafka etc.  
    read_stream_stub_micro_batch <- reactive({
        invalidateLater(1000)
        new_observation <- rexp(10)
        return(new_observation)
    })
    updated_cdf_calc <- reactive({
        micro_batch <- read_stream_stub_micro_batch()
        for (idx in seq_along(micro_batch)) {
            values[["hermite_est"]] <- isolate(values[["hermite_est"]]) %>%
                update_sequential(micro_batch[idx])
        }
        cdf_est <- isolate(values[["hermite_est"]]) %>%
            cum_prob(x, clipped = TRUE)
        df_cdf <- data.frame(x, cdf_est)
        return(df_cdf)
    })
    updated_quantile_calc <- reactive({
        values[["hermite_est"]]  %>% quant(input$percentile)
    })
    output$plot <- renderPlot({
        ggplot(updated_cdf_calc(), aes(x = x)) + geom_line(aes(y = cdf_est)) +
            ylab("Cumulative Probability")
    }
    )
    output$quantile_text <- renderText({ 
        return(paste(input$percentile * 100, "th Percentile:", 
                     round(updated_quantile_calc(), 2)))
    })
}
shinyApp(ui = ui, server = server)

Applying to non-stationary data (sequential setting)

Univariate Example

The hermite_estimator is also applicable to non-stationary data streams. A weighted form of the Hermite series based estimator can be applied to handle this case. The estimator will adapt to the new distribution and "forget" the old distribution as illustrated in the example below. In this univariate example, the distribution from which the observations are drawn switches from a Chi-square distribution to a logistic distribution and finally to a normal distribution. In order to use the exponentially weighted form of the hermite_estimator, the exp_weight_lambda argument must be set to a non-NA value. Typical values for this parameter are 0.01, 0.05 and 0.1. The lower the exponential weighting parameter, the slower the estimator adapts and vice versa for higher values of the parameter. However, variance increases with higher values of exp_weight_lambda, so there is a trade-off to bear in mind.

# Prepare Test Data
num_obs <-2000
test <- rchisq(num_obs,5)
test <- c(test,rlogis(num_obs))
test <- c(test,rnorm(num_obs))
# Calculate theoretical pdf, cdf and quantile values for comparison
x <- seq(-15,15,by=0.1)
actual_pdf_lognorm <- dchisq(x,5)
actual_pdf_logis <- dlogis(x)
actual_pdf_norm <- dnorm(x)
actual_cdf_lognorm <- pchisq(x,5)
actual_cdf_logis <- plogis(x)
actual_cdf_norm <- pnorm(x)
p <- seq(0.05,0.95,by=0.05)
actual_quantiles_lognorm <- qchisq(p,5)
actual_quantiles_logis <- qlogis(p)
actual_quantiles_norm <- qnorm(p)
# Construct Hermite Estimator 
h_est <- hermite_estimator(N=20,standardize = TRUE,exp_weight_lambda = 0.005)
# Loop through test data and update h_est to simulate observations arriving 
# sequentially
count <- 1
res <- data.frame()
res_q <- data.frame()
for (idx in seq_along(test)) {
  h_est <- h_est %>% update_sequential(test[idx])
  if (idx %% 100 == 0){
    if (floor(idx/num_obs)==0){
      actual_cdf_vals <- actual_cdf_lognorm
      actual_pdf_vals <-actual_pdf_lognorm
      actual_quantile_vals <- actual_quantiles_lognorm
    }
    if (floor(idx/num_obs)==1){
      actual_cdf_vals <- actual_cdf_logis
      actual_pdf_vals <-actual_pdf_logis
      actual_quantile_vals <- actual_quantiles_logis
    }
    if (floor(idx/num_obs)==2){
      actual_cdf_vals <- actual_cdf_norm
      actual_pdf_vals <- actual_pdf_norm
      actual_quantile_vals <- actual_quantiles_norm
    }
    idx_vals <- rep(count,length(x))
    cdf_est_vals <- h_est %>% cum_prob(x, clipped=TRUE)
    pdf_est_vals <- h_est %>% dens(x, clipped=TRUE)
    quantile_est_vals <- h_est %>% quant(p)
    res <- rbind(res,data.frame(idx_vals,x,cdf_est_vals,actual_cdf_vals,
                                pdf_est_vals,actual_pdf_vals))
    res_q <- rbind(res_q,data.frame(idx_vals=rep(count,length(p)),p,
                                    quantile_est_vals,actual_quantile_vals))
    count <- count +1
  }
}
res <- res %>% mutate(idx_vals=idx_vals*100)
res_q <- res_q %>% mutate(idx_vals=idx_vals*100)
# Visualize Results for PDF (Not run, requires gganimate, gifski and transformr
# packages)
p <- ggplot(res,aes(x=x)) + geom_line(aes(y=pdf_est_vals, colour="Estimated")) +
geom_line(aes(y=actual_pdf_vals, colour="Actual")) +
  scale_colour_manual("", 
                      breaks = c("Estimated", "Actual"),
                      values = c("blue", "black")) + 
            ylab("Probability Density") +
            transition_states(idx_vals,transition_length = 2,state_length = 1) +
  ggtitle('Observation index {closest_state}')
anim_save("pdf.gif",p)

# Visualize Results for CDF (Not run, requires gganimate, gifski and transformr
# packages)
p <- ggplot(res,aes(x=x)) + geom_line(aes(y=cdf_est_vals, colour="Estimated")) +
geom_line(aes(y=actual_cdf_vals, colour="Actual")) +
  scale_colour_manual("", 
                      breaks = c("Estimated", "Actual"),
                      values = c("blue", "black")) +
  ylab("Cumulative Probability") + 
  transition_states(idx_vals, transition_length = 2,state_length = 1) +
  ggtitle('Observation index {closest_state}')
anim_save("cdf.gif", p)

# Visualize Results for Quantiles (Not run, requires gganimate, gifski and 
# transformr packages)
p <- ggplot(res_q,aes(x=actual_quantile_vals)) +
  geom_point(aes(y=quantile_est_vals), color="blue") +
  geom_abline(slope=1,intercept = 0) +xlab("Theoretical Quantiles") +
  ylab("Estimated Quantiles") + 
  transition_states(idx_vals,transition_length = 2, state_length = 1) +
  ggtitle('Observation index {closest_state}')
anim_save("quant.gif",p)

Bivariate Example

We illustrate tracking a non-stationary bivariate data stream with another sample Shiny application. The bivariate Hermite estimator leverages an exponential weighting scheme as described in the univariate case and does not need to maintain a sliding window. We include a stub for reading streaming data that generates micro-batches of bivariate normal i.i.d. random data with a chosen Spearman's correlation coefficient (as this is easily linked to the standard correlation matrix). This stub can again be readily swapped out for a method reading micro-batches from a Kafka topic or similar.

The Shiny sample code below can be pasted into a single app.R file and run directly.

# Not Run. Copy and paste into app.R and run.
library(shiny)
library(hermiter)
library(ggplot2)
library(magrittr)

ui <- fluidPage(
  titlePanel("Bivariate Streaming Statistics Analysis Example"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("spearmans", "True Spearman's Correlation:",
                  min = -0.9, max = 0.9,
                  value = 0, step = 0.1)
    ),
    mainPanel(
      plotOutput("plot"),
      textOutput("spearman_text")
    )
  )
)

server <- function(input, output) {
  values <- reactiveValues(hermite_est = 
                             hermite_estimator(N = 10, standardize = TRUE,
                                               exp_weight_lambda = 0.01,
                                               est_type="bivariate"))
  # Note that the stub below could be replaced with code that reads streaming 
  # data from various sources, Kafka etc.  
  read_stream_stub_micro_batch <- reactive({
    invalidateLater(1000)
    sig_x <- 1
    sig_y <- 1
    num_obs <- 100
    rho <- 2 *sin(pi/6 * input$spearmans)
    observations_mat <- mvtnorm::rmvnorm(n=num_obs,mean=rep(0,2), 
    sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2),
    nrow=2,ncol=2, byrow = TRUE))
    return(observations_mat)
  })
  updated_spear_calc <- reactive({
    micro_batch <- read_stream_stub_micro_batch()
    for (idx in seq_len(nrow(micro_batch))) {
      values[["hermite_est"]] <- isolate(values[["hermite_est"]]) %>%
        update_sequential(micro_batch[idx,])
    }
    spear_est <- isolate(values[["hermite_est"]]) %>%
      spearmans(clipped = TRUE)
    return(spear_est)
  })
  output$plot <- renderPlot({
    vals <- seq(-5,5,by=0.25)
    x_grid <- as.matrix(expand.grid(X=vals, Y=vals))
    rho <- 2 *sin(pi/6 * input$spearmans)
    actual_pdf <-mvtnorm::dmvnorm(x_grid,mean=rep(0,2), 
    sigma = matrix(c(sig_x^2,rho*sig_x*sig_y,rho*sig_x*sig_y,sig_y^2), 
    nrow=2,ncol=2, byrow = TRUE))
    df_pdf <- data.frame(x_grid,actual_pdf)
    p1 <- ggplot(df_pdf) + geom_tile(aes(X, Y, fill= actual_pdf)) +
      scale_fill_gradient2(low="blue", mid="cyan", high="purple",
                           midpoint=.2,    
                           breaks=seq(0,.4,by=.1), 
                           limits=c(0,.4)) +ggtitle(paste("True Bivariate 
                    Normal Density with matched Spearman's correlation")) +
       theme(legend.title = element_blank()) 
    p1
  }
  )
  output$spearman_text <- renderText({ 
    return(paste("Spearman's Correlation Estimate from Hermite Estimator:", 
                 round(updated_spear_calc(), 1)))
  })
}
shinyApp(ui = ui, server = server)

Citation Information

To cite this package, one can use the following code to generate the citation.

citation("hermiter")

This yields:

Michael S, Melvin V (2024). hermiter: Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate). R package version 2.3.1, https://github.com/MikeJaredS/hermiter.

Michael S, Melvin V (2023). “hermiter: R package for sequential nonparametric estimation.” Computational Statistics. https://doi.org/10.1007/s00180-023-01382-0.

Copy Link

Version

Install

install.packages('hermiter')

Monthly Downloads

299

Version

2.3.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Michael Stephanou

Last Published

March 6th, 2024

Functions in hermiter (2.3.1)

hermite_estimator_univar

A class to sequentially estimate univariate pdfs, cdfs and quantile functions
hermite_function_sum_serial

Outputs the sum of orthonormal Hermite functions
hermite_int_upper

Convenience function to output a definite integral of the orthonormal Hermite functions
initialize_batch_bivar

Initializes the Hermite series based estimator with a batch of data
initialize_batch_univar

Initializes the Hermite series based estimator with a batch of data
hermite_integral_val

Outputs lower integral of the orthonormal Hermite functions
hermite_integral_val_upper

Outputs upper integral of the orthonormal Hermite functions
hermite_normalization_N

Convenience function to output Hermite normalization factors
median.hermite_estimator_univar

Estimates the median
hcdf.hermite_estimator_univar

Creates an object summarizing the CDF with associated generic methods print, plot and summary.
kendall.hermite_estimator_bivar

Estimates the Kendall rank correlation coefficient
hermite_int_full_domain

Outputs integral of the orthonormal Hermite functions on the full domain
hermite_normalization

Outputs Hermite normalization factors
hermite_polynomial

Outputs physicist version of Hermite Polynomials
print.hermite_estimator_univar

Prints univariate hermite_estimator object.
merge_pair.hermite_estimator_univar

Merges two Hermite estimators
hermite_int_lower

Convenience function to output a definite integral of the orthonormal Hermite functions
hermite_function_N

Convenience function to output orthonormal Hermite functions The method calculates the orthonormal Hermite functions, \(h_k(x)\) from \(k=0,\dots,N\) for the vector of values, x.
merge_hermite

Merges a list of Hermite estimators
merge_hermite_bivar

Merges a list of bivariate Hermite estimators
hermite_function_sum_N

Convenience function to output the sum of orthonormal Hermite functions The method calculates the sum of orthonormal Hermite functions, \(\sum_{i} h_k(x_{i})\) from \(k=0,\dots,N\) for the vector of values, x.
hermite_int_full

Convenience function to output the integral of the orthonormal Hermite functions on the full domain
kendall

Estimates the Kendall rank correlation coefficient
merge_standardized_helper_bivar

Internal method to merge a list of standardized bivariate Hermite estimators
quant

Estimates the quantiles at a vector of probability values
plot.hcdf_bivar

Plots the hcdf_bivar object as output by the hcdf function when evaluated on a hermite_estimator_bivar object.
merge_pair

Merges two Hermite estimators
merge_standardized_helper_univar

Internal method to merge a list of standardized Hermite estimators
plot.hdensity_univar

Plots the hdensity_univar object as output by the density function when evaluated on a hermite_estimator_univar object.
summary.hcdf_univar

Summarizes the hcdf_univar object as output by the hcdf function when evaluated on a hermite_estimator_univar object.
print.hcdf_bivar

Prints the hcdf_bivar object as output by the hcdf function when evaluated on a hermite_estimator_bivar object.
merge_moments_and_count_bivar

Internal method to consistently merge the number of observations, means and variances of two bivariate Hermite estimators
hermiter-package

tools:::Rd_package_title("hermiter")
hermite_polynomial_N

Convenience function to output physicist Hermite polynomials The method calculates the physicist version of Hermite polynomials, \(H_k(x)\) from \(k=0,\dots,N\) for the vector of values, x.
print.hdensity_univar

Prints the hdensity_univar object as output by the density function when evaluated on a hermite_estimator_univar object.
print.hermite_estimator_bivar

Prints bivariate hermite_estimator object.
summary.hcdf_bivar

Summarizes the hcdf_bivar object as output by the hcdf function when evaluated on a hermite_estimator_bivar object.
summary.hermite_estimator_bivar

Summarizes bivariate hermite_estimator object.
merge_hermite_univar

Merges a list of Hermite estimators
merge_moments_and_count_univar

Internal method to consistently merge the number of observations, means and variances of two Hermite estimators
spearmans.hermite_estimator_bivar

Estimates the Spearman's rank correlation coefficient
print.hcdf_univar

Prints the hcdf_univar object as output by the hcdf function when evaluated on a hermite_estimator_univar object.
spearmans

Estimates the Spearman's rank correlation coefficient
summary.hermite_estimator_univar

Summarizes univariate hermite_estimator object.
update_sequential.hermite_estimator_univar

Updates the Hermite series based estimator sequentially
plot.hcdf_univar

Plots the hcdf_univar object as output by the hcdf function when evaluated on a hermite_estimator_univar object.
update_sequential.hermite_estimator_bivar

Updates the Hermite series based estimator sequentially
quantile.hermite_estimator_univar

Estimates the quantiles at a vector of probability values
update_sequential

Updates the Hermite series based estimator sequentially
merge_pair.hermite_estimator_bivar

Merges two bivariate Hermite estimators
print.hdensity_bivar

Prints the hdensity_bivar object as output by the density function when evaluated on a hermite_estimator_bivar object.
quant.hermite_estimator_univar

Estimates the quantiles at a vector of probability values
plot.hdensity_bivar

Plots the hdensity_bivar object as output by the density function when evaluated on a hermite_estimator_bivar object.
standardizeInputs

Standardizes the observation x and updates the online moment inputs
standardizeInputsEW

Standardizes the observation x and updates the online moment inputs
cum_prob

Estimates the cumulative probability at one or more x values
dens.hermite_estimator_univar

Estimates the probability density for a vector of x values
IQR

Estimates the Interquartile range (IQR)
IQR.default

Estimates the Interquartile range (IQR)
cum_prob.hermite_estimator_bivar

Estimates the cumulative probabilities for a matrix of 2-d x values
IQR.hermite_estimator_univar

Estimates the Interquartile range (IQR)
cor

A wrapper around the stats::cor function adding two additional methods, namely method = "hermite.spearman" and method = "hermite.kendall" (can be abbreviated). The input parameters and output value semantics closely match the stats::cor method for easy interchange. If neither the "hermite.spearman" nor the "hermite.kendall" method is selected, then this function will call stats::cor with the arguments provided.
dens

Estimates the probability density at one or more x values
dens.hermite_estimator_bivar

Estimates the probability densities for a matrix of 2-d x values
cum_prob.hermite_estimator_univar

Estimates the cumulative probability for a vector of x values
hermite_function

Outputs orthonormal Hermite functions
hermite_estimator

A class to sequentially estimate univariate and bivariate pdfs and cdfs along with quantile functions in the univariate setting and nonparametric correlations in the bivariate setting.
density.hermite_estimator_univar

Creates an object summarizing the PDF with associated generic methods print and plot.
gauss_hermite_quad_100

Calculates \(\int_{-\infty}^{\infty} f(x) e^{-x^2} dx\) using Gauss-Hermite quadrature with 100 terms.
hcdf.hermite_estimator_bivar

Creates an object summarizing the bivariate CDF with associated generic methods print, plot and summary.
density.hermite_estimator_bivar

Creates an object summarizing the bivariate PDF with associated generic methods print and plot.
hcdf

Creates an object summarizing the CDF with associated generic methods print, plot and summary.
hermite_estimator_bivar

A class to sequentially estimate bivariate pdfs, cdfs and nonparametric correlations