
Last chance! 50% off unlimited learning
Sale ends in
For a numerical variable, the output includes
side-by-side boxplots grouped by dateGpBp
(left),
a trace plot of p1, p50, and p99 percentiles, grouped by dateGp
(top right),
a trace plot of mean and +-1 SD control limits, grouped by
dateGp
(middle right), and
a trace plot of missing and zerorates, grouped by dateGp
(bottom right).
For a categorical variable (including a numerical variable with no more than 2 unique levels not including NA), the output includes
a frequency bar plot (left), and
a grid of trace plots on categories' proportions over time (right).
If the variable contains more than kCategories
number of categories,
trace plots of only the largest kCategories
will be plotted.
In addition to plots, a data.table
of summary statistics are generated,
on global and over time summary statistics.
PlotVar(dataFl, myVar, weightNm, dateNm, dateGp, dateGpBp = NULL,
labelFl = NULL, highlightNms = NULL, skewOpt = NULL, kSample = 50000,
fuzzyLabelFn = NULL, kCategories = 9)
A data.table
containing at least the following columns:
myVar
, weightNm
, dateGp
, dateGpBp
; usually an
output of the PrepData
function.
Name of the variable to be plotted.
Name of the variable containing row weights, or NULL
for
no weights (all rows receiving weight 1).
Name of column containing the date variable.
Name of the variable that the time series plots should be
grouped by. Options are NULL
, "weeks"
, "months"
,
"quarters"
, "years"
. See IDate
for
details. If NULL
, then dateNm
will be used as dateGp
.
Name of variable the boxplots should be grouped by. Same
options as dateGp
. If NULL
, then dateGp
will be used.
A data.table
containing variable labels, or NULL
for no labels; usually an output of PrepLabels
.
Either NULL
or a character vector of variables to
recieve red label. Currently NULL
means all variables will get a
black legend. Ignored this argument if labelFl == NULL
.
Either a numeric constant or NULL
. Default is
NULL
(no transformation). If numeric, say 5, then all box plots of
a variable whose skewness exceeds 5 will be on a log10 scale if possible.
Negative input of skewOpt
will be converted to 3.
Either NULL
or a positive integer. If an integer,
indicates the sample size for both drawing boxplots and ordering numerical
graphs by kSample
to a
reasonable value (default is 50K) dramatically improves processing speed.
Therefore, for larger datasets (e.g. > 10 percent system memory), this
parameter should not be set to NULL
, or boxplots may take a very
long time to render. This setting has no impact on the accuracy of time
series plots on quantiles, mean, SD, and missing and zero rates.
Either NULL
or a function of 2 parameters: A label
file in the format of an output by PrepLabels
and a string
giving a variable name. The function should return the label corresponding
to the variable given by the second parameter. This function should
describe how fuzzy matching should be performed to find labels (see example
below). If NULL
, only exact matches will be retuned.
If a categorical variable has more than kCategories
,
trace plots of only the kCategories
most prevalent categories are
plotted.
A grob
(i.e., ggplot
grid) object. See the output
p
of the function or PlotNumVar
PlotCatVar
for details.
A data.table
of summary statistics. See the output
numVarSummary
of the function PlotNumVar
, or the
output catVarSummary
of the function PlotCatVar
for
details.
Indicator of the variable's type, either "nmrcl"
or
"ctgrl"
.
Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Functions depend on this function:
PrintPlots
.
This function depends on:
PlotCatVar
,
PlotNumVar
,
PrepData
.
# NOT RUN {
data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months",
dateGpBp = "quarters")
data(bankLabels)
bankLabels <- PrepLabels(bankLabels)
## PlotVar will treat numerical and categorical data differently.
## Binary data is always treated as categorical.
plot(PlotVar(bankData, myVar = "duration", weightNm = NULL, dateNm = "date",
dateGp = "months", dateGpBp = "quarters", labelFl = bankLabels)$p)
plot(PlotVar(bankData, myVar = "job", weightNm = NULL, dateNm = "date",
dateGp = "months", dateGpBp = "quarters", labelFl = bankLabels)$p)
plot(PlotVar(bankData, myVar = "loan", weightNm = NULL, dateNm = "date",
dateGp = "months", dateGpBp = "quarters", labelFl = bankLabels)$p)
# }
Run the code above in your browser using DataLab