Learn R Programming

editrules (version 2.5-0)

errorLocalizer: Create a backtracker object for error localization

Description

Generate a backtracker object for error localization in numerical, categorical, or mixed data. This function generates the workhorse program, called by localizeErrors with method=localizer.

Usage

errorLocalizer(E, x, ...)

## S3 method for class 'editset': errorLocalizer(E, x, ...)

## S3 method for class 'editmatrix': errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, ...)

## S3 method for class 'editarray': errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, ...)

## S3 method for class 'editlist': errorLocalizer(E, x, weight = rep(1, length(x)), maxadapt = length(x), maxweight = sum(weight), maxduration = 600, ...)

Arguments

x
a named numerical vector or list (if E is an editmatrix), a named character vector or list (if E is an editarray), or a named list if E is an
...
Arguments to be passed to other methods (e.g. reliability weights)
weight
a lengt(x) positive weight vector. The weights are assumed to be in the same order as the variables in x.
maxadapt
maximum number of variables to adapt
maxweight
maximum weight of solution, if weights are not given, this is equal to the maximum number of variables to adapt.
maxduration
maximum time (in seconds), for $searchNext(), $searchAll() (not for $searchBest, use $searchBest(maxdration=) in stead)

Value

  • an object of class backtracker. Each execution of $searchNext() yields a solution in the form of a list (see details). Executing $searchBest() returns the lowest-weight solution. When multiple solotions with the same weight are found, $searchBest() picks one at random.

Details

The returned backtracker can be used to run a branch-and-bound algorithm which finds the least (weighted) number of variables in x that need to be adapted so that all restrictions in E can be satisfied. (Generalized principle of Fellegi and Holt (1976)).

The B&B tree is set up so that in in one branche, a variable is assumed correct and its value subsituted in E, while in the other branche a variable is assumed incorrect and eliminated from E. See De Waal (2003), chapter 8 or De Waal, Pannekoek and Scholtus (2011) for a concise description of the B&B algorithm.

Every call to $searchNext() returns one solution list, consisting of

  • w: The solution weight.
adapt: logical indicating whether a variable should be adapted (TRUE) or not

References

I.P. Fellegi and D. Holt (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association 71, pp 17-25

T. De Waal (2003) Processing of unsave and erroneous data. PhD thesis, Erasmus Research institute of management, Erasmus university Rotterdam. http://www.cbs.nl/nl-NL/menu/methoden/onderzoek-methoden/onderzoeksrapporten/proefschriften/2008-proefschrift-de-waal.htm

T. De Waal, Pannekoek, J. and Scholtus, S. (2011) Handbook of Statistical Data Editing. Wiley Handbooks on Survey Methodology.

See Also

errorLocalizer.mip, localizeErrors, checkDatamodel, violatedEdits,

Examples

Run this code
#### examples with numerical edits
# example with a single editrule
# p = profit, c = cost, t = turnover
E <- editmatrix(c("p + c == t"))
cp <- errorLocalizer(E, x=c(p=755, c=125, t=200))
# x obviously violates E. With all weights equal, changing any variable will do.
# first solution:
cp$searchNext()
# second solution:
cp$searchNext()
# third solution:
cp$searchNext()
# there are no more solution since changing more variables would increase the weight,
# so the result of the next statement is NULL:
cp$searchNext()

# Increasing the reliability weight of turnover, yields 2 solutions:
cp <- errorLocalizer(E, x=c(p=755, c=125, t=200), weight=c(1,1,2))
# first solution:
cp$searchNext()
# second solution:
cp$searchNext()
# no more solutions available:
cp$searchNext()


# A case with two restrictions. The second restriction demands that
# c/t >= 0.6 (cost should be more than 60\% of turnover)
E <- editmatrix(c(
        "p + c == t",
        "c - 0.6*t >= 0"))
cp <- errorLocalizer(E,x=c(p=755,c=125,t=200))
# Now, there's only one solution, but we need two runs to find it (the 1st one has higher weight)
cp$searchNext()
cp$searchNext()

# With the searchBest() function, the lowest weifght solution is found at once:
errorLocalizer(E,x=c(p=755,c=125,t=200))$searchBest()


# An example with missing data.
E <- editmatrix(c(
    "p + c1 + c2 == t",
    "c1 - 0.3*t >= 0",
    "p > 0",
    "c1 > 0",
    "c2 > 0",
    "t > 0"))
cp <- errorLocalizer(E,x=c(p=755, c1=50, c2=NA,t=200))
# (Note that e2 is violated.)
# There are two solutions. Both demand that c2 is adapted:
cp$searchNext()
cp$searchNext()

##### Examples with categorical edits
# 
# 3 variables, recording age class, position in household, and marital status:
# We define the datamodel and the rules
E <- editarray(c(
    "age \%in\% c('under aged','adult')",
    "maritalStatus \%in\% c('unmarried','married','widowed','divorced')",
    "positionInHousehold \%in\% c('marriage partner', 'child', 'other')",
    "if( age == 'under aged' ) maritalStatus == 'unmarried'",
    "if( maritalStatus \%in\% c('married','widowed','divorced')) !positionInHousehold \%in\% c('marriage partner','child')"
    )
)
E

# Let's define a record with an obvious error:
r <- c(age = 'under aged', maritalStatus='married', positionInHousehold='child')
# The age class and position in household are consistent, while the marital status conflicts. 
# Therefore, changing only the marital status (in stead of both age class and postition in household)
# seems reasonable. 
el <- errorLocalizer(E,r)
el$searchNext()

Run the code above in your browser using DataLab