Calculates R2 of a linear model of the formula var
~ dateNm
for
each var
of class nmrcl
and returns a vector of
variable names ordered by highest R2. The linear model can be calculated over
a subset of dates, see details of parameter buildTm
. Non-numerical
variables are returned in alphabetical order after the sorted numerical
variables.
OrderByR2(dataFl, dateNm, buildTm = NULL, weightNm = NULL,
kSample = 50000)
A data.table
of data; must be the output of the
PrepData
function.
Name of column containing the date variable.
Vector identify time period for ranking/anomaly detection (most likely model build period). Allows for a subset of plotting time period to be used for anomaly detection.
Must be a vector of dates and must be inclusive i.e. buildTm[1] <= date <= buildTm[2] will define the time period.
Must be either NULL
, a vector of length 2, or a vector of
length 3.
If NULL
, the entire dataset will be used for
ranking/anomaly detection.
If a vector of length 2, the format of the dates must be a character vector in default R date format (e.g. "2017-01-30").
If a vector of length 3, the first two columns must contain dates
in any strptime format, while the 3rd column contains the strptime
format (see strptime
).
The following are equivalent ways of selecting all of 2014:
c("2014-01-01","2014-12-31")
c("01JAN2014","31DEC2014", "%d%h%Y")
Name of the variable containing row weights, or NULL
for
no weights (all rows receiving weight 1).
Either NULL
or a positive integer. If an integer,
indicates the sample size for both drawing boxplots and ordering numerical
graphs by \(R^2\). When the data is large, setting kSample
to a
reasonable value (default is 50K) dramatically improves processing speed.
Therefore, for larger datasets (e.g. > 10 percent system memory), this
parameter should not be set to NULL
, or boxplots may take a very
long time to render. This setting has no impact on the accuracy of time
series plots on quantiles, mean, SD, and missing and zero rates.
A vector of variable names sorted by R2 of lm
of the formula
var
~ dateNm
(highest R2 to lowest)
Copyright 2017 Capital One Services, LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Functions depend on this function:
vlm
.
# NOT RUN {
data(bankData)
bankData <- PrepData(bankData, dateNm = "date", dateGp = "months",
dateGpBp = "quarters")
OrderByR2(bankData, dateNm = "date")
# }
Run the code above in your browser using DataLab