MAINT.Data-package: Modelling and Analizing Interval Data

Description

MAINT.DATA implements methodologies for modelling Interval Data by Normal and Skew-Normal distributions, considering four different possible configurations structures for the variance-covariance matrix. It introduces a data class for representing interval data and includes functions and methods for parametric modelling and analysing of interval data. It performs maximum likelihood and trimmed maximum likelihood estimation, statistical tests, as well as (M)ANOVA and Discriminant Analysis.

Arguments

Details

In the classical model of multivariate data analysis, data is represented in a data-array where n ``individuals" (usually in rows) take exactly one value for each variable (usually in columns). Symbolic Data Analysis (see, e.g., Noirhomme-Fraiture and Brito (2011)) provides a framework where new variable types allow to take directly into account variability and/or uncertainty associated to each single ``individual", by allowing multiple, possibly weighted, values for each variable. New variable types - interval, categorical multi-valued and modal variables - have been introduced. We focus on the analysis of interval data, i.e., where elements are described by variables whose values are intervals. Parametric inference methodologies based on probabilistic models for interval variables are developed in Brito and Duarte Silva (2011) where each interval is represented by its midpoint and log-range,for which Normal and Skew-Normal (Azzalini and Dalla Valle (1996)) distributions are assumed. The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which are represented by four different possible configurations. MAINT.DATA implements the proposed methodologies in R, introducing a data class for representing interval data; it includes functions for modelling and analysing interval data, in particular maximum likelihood and trimmed maximum likelihood (see, e.g. Hadi and Luceno (1997)) estimation, and statistical tests for the different considered configurations. Methods for (M)ANOVA and Discriminant Analysis (Duarte Silva and Brito (2015)) of this data class are also provided.

Package:	MAINT.Data
Type:	Package
Version:	1.0.1
Date:	2016-12-29
License:	GPL-2
LazyLoad:	yes
LazyData:	yes

References

Azzalini, A. and Dalla Valle, A. (1996), The multivariate skew-normal distribution. Biometrika 83(4), 715--726.

Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3--20.

Duarte Silva, A.P. and Brito, P. (2015), Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification 39(3), 516--541.

Hadi, A. S. and Luceno, A. (1997), Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms. Computational Statistics and Data Analysis 25(3), 251--272.

Noirhomme-Fraiture, M., Brito, P. (2011), Far Beyond the Classical Data Models: Symbolic Data Analysis. Statistical Analysis and Data Mining 4(2), 157--170.

Examples

Run this code

# Create an Interval-Data object containing the intervals for 899 observations 
# on the temperatures by quarter in 60 Chinese meteorological stations.

ChinaT <- IData(ChinaTemp[1:8],VarNames=c("T1","T2","T3","T4"))

#Display the first and last observations

head(ChinaT)
tail(ChinaT)

#Print summary statistics

summary(ChinaT)

#Create a new data set considering only the Winter (1st and 4th) quarter intervals

ChinaWT <- ChinaT[,c(1,4)]

# Estimate normal distributuion parameters by maximum likelihood, assuming 
# the classical (unrestricted) covariance configuration Case 1

ChinaWTE.C1 <- mle(ChinaWT,CovCase=1)
cat("Winter temperatures of China -- normal maximum likelhiood estimation results:\n")
print(ChinaWTE.C1)
cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C1))

# Estimate normal distributuion parameters by maximum likelihood, 
# assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold

ChinaWTE.C234 <- mle(ChinaWT,CovCase=2:4)
cat("Winter temperatures of China -- normal maximum likelhiood estimation results:\n")
print(ChinaWTE.C234)
cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.C234))

# Estimate normal distributuion  parameters robustly by fast maximun trimmed likelihood, 
# assuming that one of the C2, C3 or C4 restricted covariance configuration cases hold


ChinaWTE.C234 <- fasttle(ChinaWT,CovCase=2:4)
cat("Winter temperatures of China -- normal maximum trimmed likelhiood estimation results:\n")
print(ChinaWTE.C234)

# Estimate skew-normal distributuion  parameters 

ChinaWTE.SkN <- mle(ChinaWT,Model="SKNormal")
cat("Winter temperatures of China -- Skew-Normal maximum likelhiood estimation results:\n")
print(ChinaWTE.SkN)
cat("Standard Errors of Estimators:\n") ; print(stdEr(ChinaWTE.SkN))


#MANOVA tests assuming that configuration case 1 (unrestricted covariance) 
# or 3 (MidPoints independent of Log-Ranges) holds.  

ManvChinaWT.C13 <- MANOVA(ChinaWT,ChinaTemp$GeoReg,CovCase=c(1,3))
cat("Winter temperatures of China -- MANOVA by geografical regions results:\n")
print(ManvChinaWT.C13)

#Linear Discriminant Analysis

ChinaWT.lda <- lda(ManvChinaWT.C13)
cat("Winter temperatures of China -- linear discriminant analysis results:\n")
print(ChinaWT.lda)
cat("lda Prediction results:\n")
print(predict(ChinaWT.lda,ChinaWT)$class)


#Estimate error rates by ten-fold cross-validation 

CVlda <- DACrossVal(ChinaWT,ChinaTemp$GeoReg,TrainAlg=lda,
CovCase=BestModel(H1res(ManvChinaWT.C13)),CVrep=1)
summary(CVlda[,,"Clerr"])
glberrors <- 
  apply(CVlda[,,"Nk"]*CVlda[,,"Clerr"],1,sum)/apply(CVlda[,,"Nk"],1,sum)
cat("Average global classification error =",mean(glberrors),"\n")

#Robust Quadratic Discriminant Analysis

ChinaWT.rqda <- Robqda(ChinaWT,ChinaTemp$GeoReg)
cat("Winter temperatures of China -- robust quadratic discriminant analysis results:\n")
print(ChinaWT.rqda)
cat("robust qda prediction results:\n")
print(predict(ChinaWT.rqda,ChinaWT)$class)


# Create an Interval-Data object containing the intervals for characteristics 
# of 27 cars models.

Cars <- IData(CarsData[1:8],Seq="MidPLogR_VarbyVar",
  VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration"))

#Display the first and last observations

head(Cars)
tail(Cars)

# Estimate normal distributuion parameters 

CarsNE <- mle(Cars)
cat("Cars data -- normal maximum likelhiood estimation results:\n")
print(CarsNE)
cat("Standard Errors of Estimators:\n") ; print(stdEr(CarsNE))

# Estimate normal distributuion  parameters robustly by full maximum trimmed likelihood, 


CarsTE <- fulltle(Cars)
cat("Cars data -- normal maximum trimmed likelhiood estimation results:\n")
print(CarsTE)

# Estimate parameters searching through normal and Skew-Normal distributions.

CarsNSNE <- mle(Cars,Model="NrmandSKN")
cat("Cars data  -- Maximum likelhiood estimation results:\n")
print(CarsNSNE)
cat("Standard Errors of Estimators:\n") ; print(stdEr(CarsNSNE))

Run the code above in your browser using DataLab