Analyzing Data with sparklyr

library(htmltools) thumbnail <- function(title, img, href, caption = TRUE) { div(class = "col-sm-6", a(class = "thumbnail", title = title, href = href, img(src = img), div(class = ifelse(caption, "caption", ""), ifelse(caption, title, ""), align="center")))}

The examples below showcase R applications and analysis performed using Sparklyr. You can see how to use dplyr and machine learning functions within R Notebooks, and create interactive dashboards with Spark connections using flexdashboards and Shiny Applications. Source code for these examples is available under github.com/rstudio/sparkDemos.

div(class="row", div(class="span3", br()), div(class = "span6", includeScript("https://fast.wistia.com/embed/medias/0edmtnvb9d.jsonp" ), includeScript("https://fast.wistia.com/assets/external/E-v1.js"), div(class="wistia_responsive_padding"), div(class="wistia_embed wistia_async_0edmtnvb9d videoFoam=true")), div(class="span3", br()) ) div(class = "col-sm-12", br()) div(class = "col-sm-12", br())

Notebooks

div(class = "col-sm-12", br())

###One billion NYC taxi trips

div(class = "col-sm-6","You can use Spark and R to analyze data at scale. This document describes how to use sparklyr to access and understand understand your data. Use the following tools in the toolchain. This example compares models using Spark ML, H2O and R") thumbnail("Analyzing a billion NYC taxi trips in Spark", "images/examples/h2o.png", "gallery-taxi-trips.nb.html")

###Total US births

div(class = "col-sm-6","Use dplyr syntax to write Apache Spark SQL queries. Use select, where, group by, joins, and window functions in Apache Spark SQL.") thumbnail("Total US births", "images/examples/babynames.png", "https://beta.rstudioconnect.com/content/1813/")

ML classifiers

div(class = "col-sm-6","You can use sparklyr to fit a wide variety of machine learning algorithms in Apache Spark. This analysis compares the performance of six classification models in Apache Spark on the Titanic data set. For the Titanic data, decision trees and random forests performed the best and had comparatively fast run times. See results for a detailed comparison.") thumbnail("ML classifiers", "images/examples/ml_classification_titanic.png", "https://beta.rstudioconnect.com/content/1518/")

Dashboards

div(class = "col-sm-12", br())

Time Gained in Flight

div(class = "col-sm-6","This example based on the NYC airports data analyzed in the Notebooks section") thumbnail("Time Gained in Flight", "images/examples/nycflights13-dash-spark.png", "http://sparkdemo.rstudio.com/dashboards/nycflights13-dash-spark")

Diamonds explorer

div(class = "col-sm-6","This familiar example reads data into Spark using the parquet format. You can sample and filter the data in Spark then collect the results to be visualized.") thumbnail("Diamonds Explorer", "images/examples/diamonds-explorer.png", "http://sparkdemo.rstudio.com/dashboards/diamonds-explorer")

Shiny Applications

div(class = "col-sm-12", br())

ML Titanic Classification

div(class = "col-sm-6","You can use sparklyr to fit a wide variety of machine learning algorithms in Apache Spark. This analysis compares the performance of six classification models in Apache Spark on the Titanic data set. For the Titanic data, decision trees and random forests performed the best and had comparatively fast run times. See results for a detailed comparison.") thumbnail("ML Titanic Classification", "images/examples/titanic-app.png", "http://sparkdemo.rstudio.com/apps/titanic-classification/")

Iris K-means clustering

div(class = "col-sm-6","This familiar examples read data into Spark using the parquet format. You can cluster the data in Spark then collect the results to be visualized.") thumbnail("Iris K-means clustering", "images/examples/iris-k-means.png", "http://sparkdemo.rstudio.com/apps/iris-k-means")

Time gained in flight app

div(class = "col-sm-6","You can connect a Shiny app to a live spark context. This example uses Spark SQL and ML to create a look up table. You can use Shiny to filter the look up table and visualize the results.") thumbnail("Time gained in flight", "images/examples/nycflights13-app-spark.png", "http://sparkdemo.rstudio.com/apps/nycflights13-app-spark")