Learn R Programming

smbinning (version 0.9)

smbinning.eda: Exploratory Data Analysis (EDA)

Description

It shows basic statistics for each characteristic in a data frame. The report includes:

  • Field: Field name.

  • Type: Factor, numeric, integer, other.

  • Recs: Number of records.

  • Miss: Number of missing records.

  • Min: Minimum value.

  • Q25: First quartile. It splits off the lowest 25% of data from the highest 75%.

  • Q50: Median or second quartile. It cuts data set in half.

  • Avg: Average value.

  • Q75: Third quartile. It splits off the lowest 75% of data from the highest 25%.

  • Max: Maximum value.

  • StDv: Standard deviation of a sample.

  • Neg: Number of negative values.

  • Pos: Number of positive values.

  • OutLo: Number of outliers. Records below Q25-1.5*IQR, where IQR=Q75-Q25.

  • OutHi: Number of outliers. Records above Q75+1.5*IQR, where IQR=Q75-Q25.

Usage

smbinning.eda(df, rounding = 3, pbar = 1)

Arguments

df

A data frame.

rounding

Optional parameter to define the decimal points shown in the output table. Default is 3.

pbar

Optional parameter that turns on or off a progress bar. Default value is 1.

Value

The command smbinning.eda generates two data frames that list each characteristic with basic statistics such as extreme values and quartiles; and also percentages of missing values and outliers, among others.

Examples

Run this code
# NOT RUN {
# Load library and its dataset
library(smbinning) # Load package and its data

# Example: Exploratory data analysis of dataset
smbinning.eda(smbsimdf1,rounding=3)$eda # Table with basic statistics
smbinning.eda(smbsimdf1,rounding=3)$edapct # Table with basic percentages
# }

Run the code above in your browser using DataLab