igate
or categorical.igate
.Takes a new data frame to be used for validation and the causes and control bands
obtained from igate
or categorical.igate
and returns
all those observations that fall within these control bands.
validate(validation_df, target, causes, results_df, type = NULL)
Data frame to be used for validation. It is recommended to use
a different data frame from the one used in igate
/ categorical.igate
.
The same data frame can be used if just a sanity check of the results is performed. This
data frame must contain the target
variable as well as all the causes determined
by igate
/ categorical.igate
.
Target variable that was used in igate
or categorical.igate
.
Causes determined by igate
or categorical.igate
.
If you saved the results of igate
/ categorical.igate
in an object
results
, simply use results$Causes
here.
The data frame containing the results of igate
or categorical.igate
.
The type of igate that was performed: either "continuous"
or "categorical"
. If not provided
function will try to guess the correct type based on the type of validation_df[[target]]
.
A list of three data frames is returned. The first data frame contains those observations
in validation_df
that fall into *all* the good resp. bad control bands specified in results_df
.
The columns are target
, then one column for each of the causes
and a new column
expected_quality
which is "good"
if the observation falls into all the good
control bands and "bad"
if it falls into all the bad control bands.
The second data frame has three columns
Cause |
Each of the causes . |
Good_Count |
If we selected all those observations that fall into the good band of this cause, how many observations would we select? |
The third data frame summarizes the first data frame: If type = "continuous"
it has
three columns:
expected_quality |
Either "good" or "bad" . |
max_target |
The maximum value for target for the observations with "good"
expected quality resp. "bad" expected quality. |
If type = "categorical"
it has the following three columns:
expected_quality |
Either "good" or "bad" . |
Category |
A list of categories of the observations with expected quality good resp. bad. |
If a value of Good_Count
or Bad_count
is very low in the second
data frame, it means that this cause is excluding a lot of observations from the
first data frame. Consider re-running validate
with this cause removed from
causes
.
# NOT RUN {
validate(iris, target = "Sepal.Length", causes = resultsIris$Causes, results_df = resultsIris)
# }
Run the code above in your browser using DataCamp Workspace