Learn R Programming

PDtoolkit (version 1.2.0)

univariate: Univariate analysis

Description

univariate returns the univariate statistics for risk factors supplied in data frame db.
For numeric risk factors univariate report includes:

  • rf: Risk factor name.

  • rf.type: Risk factor class. This metric is always equal to numeric.

  • bin.type: Bin type - special or complete cases.

  • bin: Bin type. If a sc.method argument is equal to "together", then bin and bin.type have the same value. If the sc.method argument is equal to "separately", then the bin will contain all special cases that exist for analyzed risk factor (e.g. NA, NaN, Inf).

  • pct: Percentage of observations in each bin.

  • cnt.unique: Number of unique values per bin.

  • min: Minimum value.

  • p1, p5, p25, p50, p75, p95, p99: Percentile values.

  • avg: Mean value.

  • avg.se: Standard error of the mean.

  • max: Maximum value.

  • neg: Number of negative values.

  • pos: Number of positive values.

  • cnt.outliers: Number of outliers. Records above or below Q75\(\pm\)1.5 * IQR, where IQR = Q75 - Q25.

  • sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds sc.threshold otherwise 0.

For categorical risk factors univariate report includes:

  • rf: Risk factor name.

  • rf.type: Risk factor class. This metric is equal to one of: character, factor or logical.

  • bin.type: Bin type - special or complete cases.

  • bin: Bin type. If a sc.method argument is equal to "together", then bin and bin.type have the same value. If the sc.method argument is equal to "separately", then the bin will contain all special cases that exist for analyzed risk factor (e.g. NA, NaN, Inf).

  • pct: Percentage of observations in each bin.

  • cnt.unique: Number of unique values per bin.

  • sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds sc.threshold otherwise 0.

Usage

univariate(
  db,
  sc = c(NA, NaN, Inf, -Inf),
  sc.method = "together",
  sc.threshold = 0.2
)

Value

The command univariate returns the data frame with explained univariate metrics for numeric, character, factor and logical class of risk factors.

Arguments

db

Data frame of risk factors supplied for univariate analysis.

sc

Vector of special case elements. Default values are c(NA, NaN, Inf).

sc.method

Define how special cases will be treated, all together or in separate bins. Possible values are "together", "separately".

sc.threshold

Threshold for special cases expressed as percentage of total number of observations. If sc.method is set to "separately", then percentage for each special case will be summed up.

Examples

Run this code
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[100:120] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, y.type = "bina")[[2]]
gcd$age.bin <- as.factor(gcd$age.bin)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "bina")[[2]]
gcd$all.miss1 <- NaN
gcd$all.miss2 <- NA
gcd$tf <- sample(c(TRUE, FALSE), nrow(gcd), rep = TRUE)
#create date variable to confirm that it will not be processed by the function
gcd$dates <- Sys.Date()
str(gcd)
univariate(db = gcd)

Run the code above in your browser using DataLab