SmartEDA (version 0.3.2)

ExpNumStat: Summary statistics for numerical variables

Description

Function provides summary statistics for all numerical variable. This function automatically scans through each variable and select only numeric/integer variables. Also if we know the target variable, function will generate relationship between target variable and each independent variable.

Usage

ExpNumStat(data,by=NULL,gp=NULL,Qnt=NULL,Nlim=10,MesofShape=2,
Outlier=FALSE,round=3,dcast=FALSE,val=NULL)

Arguments

data

dataframe or matrix

by

group by A (summary statistics by All), G (summary statistics by group), GA (summary statistics by group and Overall)

gp

target variable if any, default NULL

Qnt

default NULL. Specified quantiles [c(.25,0.75) will find 25th and 75th percentiles]

Nlim

numeric variable limit (default value is 10 which means it will only consider those variable having more than 10 unique values and variable type is numeric/integer)

MesofShape

Measures of shapes (Skewness and kurtosis).

Outlier

Calculate the lower hinge, upper hinge and number of outliers

round

round off

dcast

fast dcast from data.table

val

Name of the column whose values will be filled to cast (see Detials sections for list of column names)

Value

summary statistics for numeric independent variables

Details

Summary by <U+2013> overall

Summary by <U+2013> group (target variable)

Summary by <U+2013> overall and group (target variable)

coloumn descriptions

<U+2022> Vname <U+2013> Variable name

<U+2022> Group <U+2013> Target variable

<U+2022> TN <U+2013> Total sample (inculded NA observations)

<U+2022> nNeg <U+2013> Total negative observations

<U+2022> nZero <U+2013> Total zero observations

<U+2022> nPos <U+2013> Total positive observations

<U+2022> NegInf <U+2013> Negative infinite count

<U+2022> PosInf <U+2013> Positive infinite count

<U+2022> NA_value <U+2013> Not Applicable count

<U+2022> Per_of_Missing <U+2013> Percentage of missings

<U+2022> Min <U+2013> minimum value

<U+2022> Max <U+2013> maximum value

<U+2022> Mean <U+2013> average value

<U+2022> Median <U+2013> median value

<U+2022> SD <U+2013> Standard deviation

<U+2022> CV <U+2013> coefficient of variations (SD/mean)*100

<U+2022> IQR <U+2013> Inter quartile range

<U+2022> Qnt <U+2013> Specified quantiles

<U+2022> MesofShape <U+2013> Skewness and Kurtosis

<U+2022> Outlier <U+2013> Number of outliers

<U+2022> Cor <U+2013> Correlation b/w target and independent variables

See Also

describe.by

Examples

Run this code
# NOT RUN {
## Descriptive summary of numeric variables - Summary by Target variables
ExpNumStat(mtcars,by="G",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=TRUE,round=3)
## Descriptive summary of numeric variables - Summary by Overall
ExpNumStat(mtcars,by="A",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=TRUE,round=3)
## Descriptive summary of numeric variables - Summary by Overall and Group
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=seq(0,1,.1),MesofShape=1,
Outlier=TRUE,round=2)
## Summary by specific statistics for all numeric variables
ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2,
Outlier=FALSE,round=2,dcast = TRUE,val = "IQR")
# }

Run the code above in your browser using DataCamp Workspace