Quickly Explore Your Data Using 'ggplot2' and 'table1' Summary Tables
Quickly and easily perform exploratory data analysis by uploading your
data as a 'csv' file. Start generating insights using 'ggplot2' plots and
'table1' tables with descriptive stats, all using an easy-to-use point and click
Installation and Running information
# Install from CRAN: install.packages("ggquickeda") # Or the development version from GitHub: # install.packages("devtools") devtools::install_github("smouksassi/ggquickeda")
To launch the application, use
run_ggquickeda() then navigate to your csv file (or
run_ggquickeda(data) to launch the app with a specific dataset already loaded).
R Shiny app/package as a handy interface to ggplot2/table1. It enables you to quickly explore your data to detect trends on the fly. You can do scatter plots, dotplots, boxplots, barplots, histograms, densities and summary statistics of multiple variable(s) by column(s) splits. For a quick overview using an older version of the app head to this Youtube Tutorial .
Export Plots and Plot Code tabs contributed by Dean Attali. Once a plot is saved in the X/Y Plot tab by providing a name and hitting the Save plot star button it will become available for exporting. You can export in portrait, landscape and multiple plots per page.
Plot Code will let you look at the source code that generated the plot with the various options. This is helpful to get you to know ggplot2 code.
Quick summary statistics tables using Benjamin Rich table1 package.
The best way to learn is to load a data your are familiar with and start experimenting. Try to reproduce the steps below using the included sample_df.csv. This will give you an idea on the kind of ouputs that can be generated.
The package has also two vignettes.
Here is an overview of some of the things that can be done with the various menus:
Choose csv file to upload or use sample data This execute the code to load your csv file or the internal sample_data.csv:
read.csv("youruploadeddata.csv",na.strings = c("NA","."))
Once your data is uploaded the first column will be selected for the y variable(s): and the second column for the x variable:, respectively. A simple scatter plot of y versus x variables is shown. ggquickeda can handle one or more y variable(s) selections but only one x variable. Note that the x variable should be different from those selected for y variable(s). Whether the user selects one or more y variable(s), the y variable(s) data will be automatically stacked (gathered) into two columns named yvalues (values) and yvars (identifier from which variable the value is coming from) and a scatter plot of yvalues versus x, faceted plot by yvars will be shown. Mixing categorical and continuous variables will render all yvalues to be treated as character. The order of the selected y variables(s) matters and can be changed via drag and drop. Selections can be removed by clicking on the small x. When no y variable(s) is selected a histogram (if x variable is continuous) or a barplot (if x variable is categorical) is shown.
After selecting your y variable(s) if any and x variable you can directly proceed into data manipulation within the Inputs tab using the following subtabs. Note that the subtabs execution is sequential i.e. each subtab actions are executed in the order they appear. If the user changes an upstream action this will reset the subsequent ones.
- Recode into Binned Categories: Recode one or more continuous variable(s) into 2 to 10 categories as chosen with the N of Cut Breaks Slider.
- Treat as Categories: Treat as a continuous/numeric variable as a factor.
- Custom cuts of this variable:, defaults to min, median, max before any applied filtering: cut a continuous/numeric variable into a set of bins delimited by the user selected values. By default the min, median, max are filled in int to the varname Cuts field and a two levels factor is generated: [min,median] (median,max]. User can change input a comma separated list for example: min,value1,value2,max and then the following bins will be generated: [min,value1] (value1,value2] (value2,max]. A Checkbox to treat the generated levels as continuous 0,1,... is provided to ease some plotting operations down the line.
Recode/Reorder Categories This subtab is dynamic in the sense that the user can add/remove variables. Once a non-numeric variable is selected another field with the current variable levels will be generated. The user can reorder the levels using drag and drop and/or edit a level by hitting Backspace and typing in a new character string. Note that the order chosen here might not be reflected on the yvalues a separate subtab after stacking is provided for this Reorder Facets or axis Levels
Combine Two Variables This enables the user to select two categorical variables Var1 with levels(V1L1,V1L2) and Var2 with levels(V2L1,V2L2) to generate a new variable named Var1_Var2 with levels V1L1_V2L1, V1L1_V2L2, V1L2_V2L1, V1L2_V2L2 and so on.
Filters Up to six sequential filters, three for any type of variable Filter variable (1),Filter variable (2) or Filter variable (3) and three for continuous variables Filter continuous (1), Filter continuous (2) or Filter continuous (3).
- One Row by ID(s) Filter the data down to distinct values (one row) of the selected variable(s) which are usually identifiers for subjects, occasions, arms etc. In long data format several variable that are time invariant are repeated this helps in removing the repetitions. User might want the first row of each subject or the first row of each subject/occasion combination etc.
- Simple Rounding Rounding a numerical variable to a specified number of digits. It can help to come up with a crude binning.
- Reorder Facets or axis Levels Enables the user to reorder the yvalues using a statistical function (Median, Mean, Minimum or Maximum of another variable) with a checkbox to quickly reverse the order, if desired. The user can also manually drag and drop an order and change the name of the levels where \n is recognized as a line break.
Various options to tweak the plot:
- Controlling y and x axis labels, legends and other commonly used theme options.
- Adding a title, subtitle and a caption
A shorter version of this walk-through within the app.
Main plot is output here with the various options to generate the plot below the possibilities include:
- Plot types, Points, Lines (?) Selecting scatter plots points and or lines with control over transparency and more.
- Color/Group/Split/Size/Fill Mappings (?)
ggplot2built-in functionality for Group, color, size, fill mappings as well as up to two variable for column and row splits (faceting).
- Quantile Regression (?)
- Smooth/Linear/Logistic Regressions (?)
- Mean CI (?) Mean Confidence Intervals
- Median PIs (?) Median Prediction Intervals
- Kaplan-Meier (?) Survival K-M curves
- Correlation Coefficient (?) add a text label with the correlation coefficient
Installing the package should handle the installation of all dependencies. There are listed here in case you are curious:
The app can also be directly launched using this command
shiny::runGitHub('ggquickeda', 'smouksassi', subdir = 'inst/shinyapp')
Functions in ggquickeda
|GeomKmticks||Display tick marks on a Kaplan Meier curve|
|StatKmticks||Compute locations for tick marks|
|GeomKm||Display Kaplan Meier Curve|
|run_ggquickeda||Run the ggquickeda application|
|sample_data||Simulated Pharmacokinetic Concentration Data|
|sourceable||Make a 'ggplot2' object sourceable|
|StatKm||Adds a Kaplan Meier Estimate of Survival|
|GeomKmband||Display Kaplan Meier Curve|
|attach_source_dep||Attach dependencies to source code Attach dependencies to the source code (any input variables are automatically attached)|
|get_source_code||Retrieve the source code of a "sourceable" 'ggplot2'|
|+||Add 'ggplot2' layer to a sourceable 'ggplot2'|
|StatKmband||Adds confidence bands to a Kaplan Meier Estimate of Survival|
Vignettes of ggquickeda
Last month downloads
|License||MIT + file LICENSE|
|SystemRequirements||pandoc with https support|
|Packaged||2019-04-10 11:43:44 UTC; smouksas|
|Date/Publication||2019-04-10 12:02:41 UTC|
|imports||colourpicker , dplyr , DT , Formula , GGally , ggplot2 , ggpmisc , ggpubr , ggrepel (>= 0.7.0) , ggstance , grDevices , grid , gridExtra , Hmisc , lazyeval , markdown , methods , plotly , quantreg , rlang , scales , shiny (>= 1.0.4) , shinyjs , stats , stringr , survival , survminer , table1 (>= 1.1) , tidyr , utils|
|suggests||knitr , rmarkdown|
|depends||R (>= 3.1.0)|
|Contributors||Dean Attali, Michael Sachs, Benjamin Rich|
Include our badge in your README