Learn R Programming

FuzzyImputationTest (version 0.4.0)

StatisticalMeasures: Calculation of statistical measures for errors of the imputed data.

Description

StatisticalMeasures calculates various statistical measures between the real and imputed data.

Usage

StatisticalMeasures(trueData, imputedData, imputedMask, ...)

Value

The output is given as a matrix with columns related to all columns of the input dataset plus the overall mean.

Arguments

trueData

Name of the input matrix (or data frame) with the true values of the variables.

imputedData

Name of the input matrix (or data frame) with the imputed values.

imputedMask

Matrix (or data frame) with logical values where TRUE indicates the cells with the imputed values.

...

Additional parameters passed to other functions.

Details

The procedure calculates different statistical measures between the real and imputed data for each column, namely:

  • TrueMean - the mean only for the real but missing data,

  • ImpMean - the mean only for the imputed values,

  • TrueSD - the standard deviation only for the real but missing data,

  • ImpSD - the standard deviation only for the imputed values,

  • GenMean - the mean for the all real data (given by trueData),

  • GenImpMean - the mean for real data with the respectively imputed values (given by imputedData),

  • GenSD - the standard deviation for the all real data (given by trueData),

  • GenImpSD - the standard deviation for real data with the respectively imputed values (given by imputedData),

  • AbsDiffTrueImpMean - the absolute difference between TrueMean and ImpMean,

  • AbsDiffTrueImpSD - the absolute difference between TrueSD and ImSD,

  • AbsDiffGenImpMean - the absolute difference between GenMean and GenImpMean,

  • AbsDiffGenImpSD - the absolute difference between GenSD and GenImpSD.

To properly distinguish the real values with their imputed counterparts, the additional matrix imputedMask should be provided. In this matrix, the logical value TRUE points out the cells with the imputed values. Otherwise, FALSE should be used. These input datasets should be given as matrices or data frames.

Examples

Run this code

# seed PRNG

set.seed(1234)

# load the necessary library

library(FuzzySimRes)

# generate sample of trapezoidal fuzzy numbers with FuzzySimRes library

list1<-SimulateSample(20,originalPD="rnorm",parOriginalPD=list(mean=0,sd=1),
incrCorePD="rexp", parIncrCorePD=list(rate=2),
suppLeftPD="runif",parSuppLeftPD=list(min=0,max=0.6),
suppRightPD="runif", parSuppRightPD=list(min=0,max=0.6),
type="trapezoidal")

# convert fuzzy data into a matrix

matrix1 <- FuzzyNumbersToMatrix(list1$value)

# check starting values

head(matrix1)

# add some NAs to the matrix

matrix1NA <- IntroducingNA(matrix1,percentage = 0.1)

head(matrix1NA)

# impute missing values

matrix1DImp <- ImputationDimp(matrix1NA)

# find cells with NAs

matrix1Mask <- is.na(matrix1NA)

# calculate errors for the imputed values

StatisticalMeasures(matrix1,matrix1DImp,matrix1Mask)


Run the code above in your browser using DataLab