SmartEDA (version 0.3.10)

ExpNumStat: Summary statistics for numerical variables

Description

Function provides summary statistics for all numerical variable. This function automatically scans through each variable and select only numeric/integer variables. Also if we know the target variable, function will generate relationship between target variable and each independent variable.

Usage

ExpNumStat(
  data,
  by = "A",
  gp = NULL,
  Qnt = NULL,
  Nlim = 10,
  MesofShape = 2,
  Outlier = FALSE,
  round = 3,
  weight = NULL,
  dcast = FALSE,
  val = NULL
)

Value

summary statistics for numeric independent variables

Summary by:

  • Only overall level

  • Only group level

  • Both overall and group level

Arguments

data

dataframe or matrix

by

group by A (summary statistics by All), G (summary statistics by group), GA (summary statistics by group and Overall)

gp

target variable if any, default NULL

Qnt

default NULL. Specified quantile is c(.25,0.75) will find 25th and 75th percentiles

Nlim

numeric variable limit (default value is 3 which means it will only consider those variable having more than 3 unique values and variable type is numeric/integer)

MesofShape

Measures of shapes (Skewness and kurtosis).

Outlier

Calculate the lower hinge, upper hinge and number of outlier

round

round off

weight

a vector of weights, it must be equal to the length of data

dcast

fast dcast from data.table

val

Name of the column whose values will be filled to cast (see Details sections for list of column names)

Details

column descriptions

  • Vname is Variable name

  • Group is Target variable

  • TN is Total sample (included NA observations)

  • nNeg is Total negative observations

  • nPos is Total positive observations

  • nZero is Total zero observations

  • NegInf is Negative infinite count

  • PosInf is Positive infinite count

  • NA_value is Not Applicable count

  • Per_of_Missing is Percentage of missing

  • Min is minimum value

  • Max is maximum value

  • Mean is average value

  • Median is median value

  • SD is Standard deviation

  • CV is coefficient of variations (SD/mean)*100

  • IQR is Inter quartile range

  • Qnt is quantile values

  • MesofShape is Skewness and Kurtosis

  • Outlier is Number of outlier

  • Cor is Correlation b/w target and independent variables

See Also

describe.by

Examples

Run this code
# Descriptive summary of numeric variables is Summary by Target variables
ExpNumStat(mtcars,by="G",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
           Outlier=TRUE,round=3)
# Descriptive summary of numeric variables is Summary by Overall
ExpNumStat(mtcars,by="A",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
           Outlier=TRUE,round=3)
# Descriptive summary of numeric variables is Summary by Overall and Group
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=seq(0,1,.1),MesofShape=1,
           Outlier=TRUE,round=2)
# Summary by specific statistics for all numeric variables
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
           Outlier=FALSE,round=2,dcast = TRUE,val = "IQR")
# Weighted summary statistics
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
           Outlier=FALSE,round=2,dcast = TRUE,val = "IQR", weight = "wt")

Run the code above in your browser using DataLab