explore

Simplifies Exploratory Data Analysis.

Why this package?

Faster insights with less code for experienced R users. Exploring a fresh new dataset is exciting. Instead of searching for syntax at Stackoverflow, use all your attention searching for interesting patterns in your data, using just a handful easy to remember functions. Your code is easy to understand - even for non R users.
Instant success for new R users. It is said that R has a steep learning curve, especially if you come from a GUI for your statistical analysis. Instead of learning a lot of R syntax before you can explore data, the explore package enables you to have instant success. You can start with just one function - explore() - and learn other R syntax later step by step.

How to use it

There are three ways to use the package:

Interactive data exploration (univariat, bivariat, multivariat). A target can be defined (binary / categorical / numerical).
Generate an Automated Report with one line of code. The target can be binary, categorical or numeric.
Manual exploration using a easy to remember set of tidy functions. There are basically four "verbs" to remember:
- explore - if you want to explore a table, a variable or the relationship between a variable and a target (binary, categorical or numeric). The output of these functions is a plot.
- describe - if you want to describe a dataset or a variable (number of na, unique values, ...) The output of these functions is a text.
- explain - to create a simple model that explains a target. explain_tree() for a decision tree, explain_logreg() for a logistic regression.
- report - to generate an automated report of all variables. A target can be defined (binary, categorical or numeric)

The explore package automatically checks if an attribute is categorial or numerical, chooses the best plot-type and handles outliers (autosacling).

You can use {explore} with tidy data (each row is an observation) or with count data (each row is a group of observations with same attributes, one variable stores the number of observations). To use count data, you need to add the n parameter (variable containing the number of observations). Not all functions support count data.

Installation

CRAN

install.packages("explore")

DEV version (github)

# install from github
if (!require(devtools)) install.packages("devtools")
devtools::install_github("rolkra/explore")

if you are behind a firewall, you may want to:

Download and unzip the explore package
Then install it with devtools::install_local

# install local
if (!require(devtools)) install.packages("devtools")
devtools::install_local(path = <path of local package>, force = TRUE)

Examples

Interactive data exploration

Example how to use the explore package to explore the iris dataset

# load package
library(explore)

# explore interactive
explore(iris)

Explore variables

Explore variables with target

Explain target (Decision Tree)

Automated Report

Create a report by clicking the "report all" button or use the report() function. If no target is defined, the report shows all variables. If a target is defined, the report shows the relation between all variables and the target.

Report of all variables

iris |> report(output_dir = tempdir())

To create a report that shows all variables in relation to a target, just add the target parameter

iris |> report(output_dir = tempdir(), target = Species)

To create a report with a binary target you can use the parameter targetpct = TRUE (or split = FALSE)

# define a target (is Species versicolor?)
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris$Species <- NULL

# create report
iris |> report(output_dir = tempdir(),
                target = is_versicolor,
                targetpct = TRUE)

Manual exploration

Example how to use the functions of the explore package to explore tidy data (each row is an observation) like the iris dataset:

# load packages
library(explore)

# use iris dataset
data(iris)

# explore Species
iris |> explore(Species)

# explore Sepal.Length
iris |> explore(Sepal.Length)

# define a target (is Species versicolor?)
iris$is_versicolor <- ifelse(iris$Species == "versicolor", 1, 0)

# explore relationship between Sepal.Length and the target
iris |> explore(Sepal.Length, target = is_versicolor)

# explore relationship between all variables and the target
iris |> explore_all(target = is_versicolor)

# explore correlation between Sepal.Length and Petal.Length
iris |> explore(Sepal.Length, Petal.Length)

# explore correlation between Sepal.Length, Petal.Length and a target
iris |> explore(Sepal.Length, Petal.Length, target = is_versicolor)

# describe dataset
describe(iris)

# describe Species
iris |> describe(Species)

# explain target using a decision tree
iris$Species <- NULL
iris |> explain_tree(target = is_versicolor)

# explain target using a logistic regression
iris |> explain_logreg(target = is_versicolor)

Example how to use the functions of the explore package to explore count-data (each row is a group of observations):

# load packages
library(tibble)
library(explore)

# use titanic dataset
# n = number of observations
titanic <- as_tibble(Titanic)

# describe data
describe(titanic)

# describe Class
titanic |> describe(Class, n = n)

# explore Class
titanic |> explore(Class, n = n)

# explore relationship between Class and the target
titanic |> explore(Class, n = n, target = Survived)

# explore relationship between all variables and the target
titanic |> explore_all(n = n, target = Survived)

# explain target using a decision tree
titanic |> explain_tree(n = n, target = Survived)

Some other useful functions:

# create dataset and explore it
data <- create_data_app(obs = 1000)
explore(data)

data <- create_data_buy(obs = 1000)
explore(data)

data <- create_data_churn(obs = 1000)
explore(data)

data <- create_data_person(obs = 1000)
explore(data)

data <- create_data_unfair(obs = 1000)
explore(data)

# create random dataset with 100 observarions and 5 random variables
# and explore it
data <- create_data_random(obs = 100, vars = 5)
explore(data)

# create your own random dataset and explore it
data <- create_data_empty(obs = 1000) |> 
  add_var_random_01("target") |> 
  add_var_random_dbl("age", min_val = 18, max_val = 80) |> 
  add_var_random_cat("gender", 
                     cat = c("male", "female", "other"), 
                     prob = c(0.4, 0.4, 0.2)) |> 
  add_var_random_starsign() |> 
  add_var_random_moon()
  
explore(data)

# create an RMarkdown template to explore your own data
# set output_dir (existing file may be overwritten)
create_notebook_explore(
  output_dir = tempdir(),
  output_file = "notebook-explore.Rmd")

explore

Why this package?

How to use it

Installation

CRAN

DEV version (github)

Examples

Interactive data exploration

Explore variables

Explore variables with target

Explain target (Decision Tree)

Automated Report

Manual exploration

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in explore (1.0.1)