This function prepares an input dataset for use by all plotting functions
in this package, including the main function vlm
.
The input data dataFl
must contain, at a minimum, a date column
dateNm
and a variable to be plotted. dataFl
will be
converted to a data.table
class, and all changes are made to it by
reference.
PrepData(dataFl, dateNm, selectCols = NULL, dropCols = NULL,
dateFt = "%d%h%Y", dateGp = NULL, dateGpBp = NULL, weightNm = NULL,
varNms = NULL, dropConstants = FALSE, ...)
Either the name of an object that can be converted using
as.data.table
(e.g., a data frame), or a
character string containing the name of dataset that can be loaded using
fread
(e.g., a csv file). If the dataset is not in
your working directory then dataFl
must include (relative or
absolute) path to file.
Name of column containing the date variable.
Either NULL
, or a vector of names or indices of
variables to read into memory -- must include dateNm
,
weightNm
(if not NULL
) and all variables to be plotted. If
both selectCols
and dropCols
are NULL
, then all
variables will be read in.
Either NULL
, or a vector of variables names or indices
of variables not to read into memory. If both selectCols
and
dropCols
are NULL
, then all variables will be read in.
strptime
format of date variable. The default is SAS
format "%d%h%Y"
. But input data with R date format
"%Y-%m-%d"
will also be detected. Both of two formats can be
parsed automatically.
Name of the variable that the time series plots should be
grouped by. Options are NULL
, "weeks"
, "months"
,
"quarters"
, "years"
. See IDate
for
details. If NULL
, then dateNm
will be used as dateGp
.
Name of variable the boxplots should be grouped by. Same
options as dateGp
. If NULL
, then dateGp
will be used.
Name of the variable containing row weights, or NULL
for
no weights (all rows receiving weight 1).
Either NULL
or a vector of names or indices of variables
to be plotted. If NULL
, will default to all columns which are not
dateNm
or weightNm
. Can also be a vector of indices of the
column names, after dropCols
or selectCols
have been applied,
if applicable, and not including dateGp
, dateGpBp
(which will be added to the dataFl
by the function
PrepData
).
Logical, indicates whether or not constant (all
duplicated or NA) variables should be dropped from dataFl
prior to
plotting.
Additional parameters to be passed to
fread
.
A data.table
object, formatted for use by all plotting
functions in this package otvPlots
, including the main function
vlm
, and the individual variable plotting function
PlotVar
.
Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
If weights (weightNm
) are provided, then it is normalized to have a
sum of weights equal the total sample size, and the weights are used in all
summary statistics calculations and plotting.
Functions depend on this function:
PlotBarplot
,
PlotRatesOverTime
,
PlotCatVar
,
SummaryStats
,
PlotMean
,
PlotQuantiles
,
PlotRates
,
PlotDist
,
PlotNumVar
,
PlotVar
,
PrintPlots
,
CalcR2
,
OrderByR2
,
vlm
.
# NOT RUN {
## Use the bankData dataset in this package
data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months",
dateGpBp = "quarters")
## Columns have been assigned a plotting class (nmrcl/ctgrl)
str(bankData)
# }
Run the code above in your browser using DataLab