igate (version 0.3.3)

categorical.freqplot: Produces frequency plots (normed to density plots to account for different category sizes) for sanity check in categorical iGATE.

Description

This function takes a data frame, a categorical target variable and a list of ssv and produces a density plot of each ssv and each category of the target variable. The output is written as .png file into the current working directory. Also, summary statistics are provided. The files can be saved into the current working directory. Consider changing the working directory to a new empty folder before running if you want to save a copy of the plots.

Usage

categorical.freqplot(df, target, ssv = NULL,
  outlier_removal_ssv = TRUE, savePlots = FALSE,
  image_directory = tempdir())

Arguments

df

Data frame to be analysed.

target

Categorical target varaible to be analysed.

ssv

A vector of suspected sources of variation. These are the variables in df which we believe might have an influence on the target variable and will be tested. If no list of ssv is provided, the test will be performed on all numeric variables.

outlier_removal_ssv

Logical. Should outlier removal be performed for each ssv (default: TRUE)?

savePlots

Logical. If FALSE (the default) frequency plots will be output to the standard plotting device. If TRUE, frequency plots will be saved to image_directory as png files.

image_directory

Directory to which plots should be saved. This is only used if savePlots = TRUE and defaults to the temporary directory of the current R session, i.e. tempdir(). To save plots to the current working directory set savePlots = TRUE and image_directory = getwd().

Value

The density plots of each category of target against each ssv are written as .png file into the current working directory. Also, a data frame with the following columns is output

Causes The ssv that were analysed.
outliers_removed How many outliers (with respect to this ssv) have been removed before drawing the plot?
observations_retained After outlier removal was performed, how many observations were left and used to fit the model?

Details

Frequency plots for each ssv against each category of the target are produced and svaed to current working directory. Also a data frame with summary statistics is produced, see Value for details.

Examples

Run this code
# NOT RUN {
categorical.freqplot(mtcars, target = "cyl")

# }

Run the code above in your browser using DataLab