validate: Validates results after using `igate` or `categorical.igate`.

Description

Takes a new data frame to be used for validation and the causes and control bands obtained from igate or categorical.igate and returns all those observations that fall within these control bands.

Usage

validate(validation_df, target, causes, results_df, type = NULL)

Arguments

validation_df

Data frame to be used for validation. It is recommended to use a different data frame from the one used in igate/ categorical.igate. The same data frame can be used if just a sanity check of the results is performed. This data frame must contain the target variable as well as all the causes determined by igate/ categorical.igate.

target

Target variable that was used in igate or categorical.igate.

causes

Causes determined by igate or categorical.igate. If you saved the results of igate/ categorical.igate in an object results, simply use results$Causes here.

results_df

The data frame containing the results of igate or categorical.igate.

type

The type of igate that was performed: either "continuous" or "categorical". If not provided function will try to guess the correct type based on the type of validation_df[[target]].

Value

A list of three data frames is returned. The first data frame contains those observations in validation_df that fall into *all* the good resp. bad control bands specified in results_df. The columns are target, then one column for each of the causes and a new column expected_quality which is "good" if the observation falls into all the good control bands and "bad" if it falls into all the bad control bands.

The second data frame has three columns

`Cause`	Each of the `causes`.
`Good_Count`	If we selected all those observations that fall into the good band of this cause, how many observations would we select?

The third data frame summarizes the first data frame: If type = "continuous" it has three columns:

`expected_quality`	Either `"good"` or `"bad"`.
`max_target`	The maximum value for `target` for the observations with "good" expected quality resp. "bad" expected quality.

If type = "categorical" it has the following three columns:

`expected_quality`	Either `"good"` or `"bad"`.
`Category`	A list of categories of the observations with expected quality good resp. bad.

Details

If a value of Good_Count or Bad_count is very low in the second data frame, it means that this cause is excluding a lot of observations from the first data frame. Consider re-running validate with this cause removed from causes.

Examples

Run this code

# NOT RUN {
validate(iris, target = "Sepal.Length", causes = resultsIris$Causes, results_df = resultsIris)

# }

Run the code above in your browser using DataLab