hpaML: Semi-nonparametric maximum likelihood estimation

Description

This function performs semi-nonparametric maximum likelihood estimation via hermite polynomial densities approximation.

Usage

hpaML(x, pol_degrees = numeric(0), tr_left = numeric(0),
  tr_right = numeric(0), given_ind = logical(0),
  omit_ind = logical(0), x0 = numeric(0))

Arguments

numeric matrix which rows are realizations of independend identically distributed random vectors while columns correspond to variables.

pol_degrees

non-negative integer vector of polynomial degrees.

tr_left

numeric matrix of left (lower) truncation limits. Note that tr_right rows are observations while variables are columns. If tr_left or tr_right is single row matrix then the same truncation limits would be applied to all observations that are determined by the first rows of these matrices.

tr_right

numeric matrix of right (upper) truncation limits. Note that tr_right rows are observations while variables are columns. If tr_left or tr_right is single row matrix then the same truncation limits would be applied to all observations that are determined by the first rows of these matrices.

given_ind

logical vector indicating wheather corresponding component is conditioned. By default it is a logical vector of FALSE values.

omit_ind

logical vector indicating wheather corresponding component is omitted. By default it is a logical vector of FALSE values.

numeric vector of optimization routine initial values. Note that x0=c(pol_coefficients[-1], mean, sd). For pol_coefficients, mean and sd documentation see dhpa function.

Value

This function returns an object of class "hpaML". An object of class "hpaML" is a list containing the following components:

optim - optim function output.
x1 - numeric vector of distribution parameters estimates.
mean - density function mean vector estimate.
sd - density function sd vector estimate.
pol_coefficients - polynomial coefficients estimates.
tr_left - the same as tr_left input parameter.
tr_right - the same as tr_right input parameter.
omit_ind - the same as omit_ind input parameter.
given_ind - the same as given_ind input parameter.
cov_matrix - estimated parameters covariance matrix estimate.
results - numeric matrix representing estimation results.
log-likelihood - value of Log-Likelihood function.
AIC - AIC value.
data - the same as x input parameter but without NA observations.
n_obs - number of observations.

Details

Semi-nonparametric approach has been implemented via densities hermite polynomial approximation

Densities hermite polynomial approximation approach has been proposed by A. Gallant and D. W. Nychka in 1987. The main idea is to approximate unknown distribution density with hermite polynomial of degree pol_degree. In this framework hermite polynomial represents adjusted (to insure integration to 1) product of squared polynomial and normal distribution densities. Parameters mean and sd determine means and standard deviations of normal distribution density functions which are parts of this polynomial. For more information please refer to the literature listed below.

Parameters mean, sd, given_ind, omit_ind should have the same length as pol_degrees parameter.

The first polynomial coefficient (zero powers) set to identity constant for identification reasons.

Standard errors and significance levels are derived under parametric assumptions i.e. it is assumed that distribution related to hermite polynomial density and real distribution are from the same family.

This function maximizes log-likelihood function via optim setting it's method argument to "BFGS".

References

A. Gallant and D. W. Nychka (1987) <doi:10.2307/1913241>

Examples

Run this code

# NOT RUN {
##Approximate student (t) distribution

#Simulate 5000 realizations of student distribution with 5 degrees of freedom
n <- 5000
df <- 5
x <- matrix(rt(n, df), ncol = 1)
pol_degrees <- c(4)

#Apply pseudo maximum likelihood routine
ml_result <- hpa::hpaML(x = x, pol_degrees = pol_degrees)
summary(ml_result)

#Get predicted probabilites (density values) approximations
predict(ml_result)

##Approximate chi-squared distribution

#Simulate 5000 realizations of chi-squared distribution with 5 degrees of freedom
#Let's set lower truncation point at sample minimum realization
n <- 5000
df <- 5
x <- matrix(rchisq(n, df), ncol = 1)
pol_degrees <- c(1)
tr_left <- as.vector(min(x))
tr_right <- as.vector(max(x))

#Apply pseudo maximum likelihood routine
ml_result <- hpa::hpaML(x = x, pol_degrees = as.vector(pol_degrees), 
				tr_left = as.vector(tr_left),
				tr_right = as.vector(tr_right))
summary(ml_result)

#Get predicted probabilites (density values) approximations
predict(ml_result)

##Approximate multivariate student (t) distribution
##Note that calculations may take up to a minute

#Simulate 5000 realizations of three dimensional student distribution with 5 degrees of freedom
library("mvtnorm")
cov_mat <- matrix(c(1, 0.25, -0.25, 0.25, 1, 0.25, -0.25, 0.25, 1), ncol = 3)
x <- rmvt(n = 5000, sigma = cov_mat, df = 5)

#Estimate approximating joint distribution parameters
model <- hpaML(x = x, pol_degrees = c(1,1,1))

#Get summary
summary(model)

#Get predicted values for joint density function
predict(model)

##Approximate student (t) distribution and plot densities approximated
##under different hermite polynomial degrees against 
##true density (of student distribution)

#Simulate 5000 realizations of t-distribution with 5 degrees of freedom
n <- 5000
df <- 5
x <- matrix(rt(n, df), ncol=1)

#Apply pseudo maximum likelihood routine
#Create matrix of lists where i-th element contains hpaML results for K=i
ml_result <- matrix(list(), 4, 1)
for(i in 1:4)
{
 ml_result[[i]] <- hpa::hpaML(x = x, pol_degrees = i)
}

#Generate test values
test_values <- seq(qt(0.001, df), qt(0.999, df), 0.001)
n0 <- length(test_values)

#t-distribution density function at test values points
true_pred <- dt(test_values, df)

#Create matrix of lists where i-th element contains densities predictions for K=i
PGN_pred <- matrix(list(), 4, 1)
for(i in 1:4)
{
  PGN_pred[[i]] <- predict(model = ml_result[[i]], 
                          newdata = matrix(test_values, ncol=1))
}
#Plot the result
library("ggplot2")

#prepare the data
h <- data.frame("values" = rep(test_values,5),
                "predictions" = c(PGN_pred[[1]],PGN_pred[[2]],
                                  PGN_pred[[3]],PGN_pred[[4]],
                                  true_pred), 
                "Density" = c(
                  rep("K=1",n0), rep("K=2",n0),
                  rep("K=3",n0), rep("K=4",n0),
                  rep("t-distribution",n0))
                  )
                  
#build the plot
ggplot(h, aes(values, predictions)) + geom_point(aes(color = Density)) +
  theme_minimal() + theme(legend.position = "top", text = element_text(size=26),
                         legend.title=element_text(size=20), legend.text=element_text(size=28)) +
  guides(colour = guide_legend(override.aes = list(size=10))
  )

#Get informative estimates summary for K=4 (minimal AIC)
summary(ml_result[[4]])
# }

Run the code above in your browser using DataLab