Learn R Programming

yaImpute (version 1.0-21)

notablyDifferent: Finds obervations with large differences between observed and imputed values

Description

This routine identifies observations with large errors as measured by scaled root mean square error (see rmsd.yai). A threshold is used to detect observations with large differences.

Usage

notablyDifferent(object,vars=NULL,threshold=NULL,p=.05,...)

Arguments

object
an object of class yai.
vars
a vector of character strings naming the variables to use, if null the X-variables form object are used.
threshold
a threshold that if exceeded the observations are listed as notably different.
p
(1-p)*100 is the percentile point in the distribution of differences used to compute the threshold (used when threshold is NULL).
...
additional arguments passed to impute.yai.

Value

  • A named list of several items. In all cases vectors are named using the observation ids which are the row names of the data used to build the yaiobject.
  • callThe call.
  • varsThe variables used (may be fewer than requested).
  • thresholdThe threshold value.
  • notablyDifferent.refsA sorted named vector of references that exceed the threshold.
  • notablyDifferent.trgsA sorted named vector of targets that exceed the threshold.
  • rmsdS.refsA sorted named vector of scaled RMSD references.
  • rmsdS.trgsA sorted named vector of scaled RMSD targets.

Details

The scaled differences are computed a follows:
  1. A matrix of differences between observed and imputed values is computed for each observation (rows) and each variable (columns).
  2. These differences are scaled by dividing by the standard deviation of the observed values among thereferenceobservations.
  3. The scaled differences are squared.
  4. Row means are computed resulting in one value for each observation.
  5. The square root of each of these values is taken.
These values are Euclidean distances between the target observations and their nearest references as measured using specified variables. All the variables that are used must have observed and imputed values. Generally, this will be the X-variables and not the Y-variables. When threshold is NULL, the function computes one using the quantile function with its default arguments and probs=1-p.

See Also

notablyDistant, plot.notablyDifferent, yai, link{grmsd}

Examples

Run this code
data(iris)

set.seed(12345)

# form some test data
refs=sample(rownames(iris),50)
x <- iris[,1:3]      # Sepal.Length Sepal.Width Petal.Length
y <- iris[refs,4:5]  # Petal.Width Species

# build an msn run, first build dummy variables for species.

sp1 <- as.integer(iris$Species=="setosa")
sp2 <- as.integer(iris$Species=="versicolor")
y2 <- data.frame(cbind(iris[,4],sp1,sp2),row.names=rownames(iris))
y2 <- y2[refs,]

names(y2) <- c("Petal.Width","Sp1","Sp2")

msn <- yai(x=x,y=y2,method="msn")

notablyDifferent(msn)

Run the code above in your browser using DataLab