%>%

0th

Percentile

magrittr forward-pipe operator

Pipe an object forward into a function or call expression.

Usage
lhs %>% rhs
Arguments
lhs
A value or the magrittr placeholder.
rhs
A function call using the magrittr semantics.
Details

Using %>% with unary function calls When functions require only one argument, x %>% f is equivalent to f(x) (not exactly equivalent; see technical note below.)

Placing lhs as the first argument in rhs call The default behavior of %>% when multiple arguments are required in the rhs call, is to place lhs as the first argument, i.e. x %>% f(y) is equivalent to f(x, y).

Placing lhs elsewhere in rhs call Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example, y %>% f(x, .) is equivalent to f(x, y) and z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z).

Using the dot for secondary purposes Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design the behavior is slightly different when using it inside nested function calls. In particular, if the placeholder is only used in a nested function call, lhs will also be placed as the first argument! The reason for this is that in most use-cases this produces the most readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly more compact. It is possible to overrule this behavior by enclosing the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is equivalent to c(min(1:10), max(1:10)).

Using %>% with call- or function-producing rhs It is possible to force evaluation of rhs before the piping of lhs takes place. This is useful when rhs produces the relevant call or function. To evaluate rhs first, enclose it in parentheses, i.e. a %>% (function(x) x^2), and 1:10 %>% (call("sum")). Another example where this is relevant is for reference class methods which are accessed using the $ operator, where one would do x %>% (rc$f), and not x %>% rc$f.

Using lambda expressions with %>% Each rhs is essentially a one-expression body of a unary function. Therefore defining lambdas in magrittr is very natural, and as the definitions of regular functions: if more than a single expression is needed one encloses the body in a pair of braces, { rhs }. However, note that within braces there are no "first-argument rule": it will be exactly like writing a unary function where the argument name is "." (the dot).

Using the dot-place holder as lhs When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input. See the examples.

Technical notes

The magrittr pipe operators use non-standard evaluation. They capture their inputs and examines them to figure out how to proceed. First a function is produced from all of the individual right-hand side expressions, and then the result is obtained by applying this function to the left-hand side. For most purposes, one can disregard the subtle aspects of magrittr's evaluation, but some functions may capture their calling environment, and thus using the operators will not be exactly equivalent to the "standard call" without pipe-operators. Another note is that special attention is advised when using non-magrittr operators in a pipe-chain (+, -, $, etc.), as operator precedence will impact how the chain is evaluated. In general it is advised to use the aliases provided by magrittr.

See Also

%<>%, %T>%, %$%

Aliases
  • %>%
Examples
library(magrittr) # Basic use: iris %>% head # Use with lhs as first argument iris %>% head(10) # Using the dot place-holder "Ceci n'est pas une pipe" %>% gsub("une", "un", .) # When dot is nested, lhs is still placed first: sample(1:10) %>% paste0(LETTERS[.]) # This can be avoided: rnorm(100) %>% {c(min(.), mean(.), max(.))} %>% floor # Lambda expressions: iris %>% { size <- sample(1:10, size = 1) rbind(head(., size), tail(., size)) } # renaming in lambdas: iris %>% { my_data <- . size <- sample(1:10, size = 1) rbind(head(my_data, size), tail(my_data, size)) } # Building unary functions with %>% trig_fest <- . %>% tan %>% cos %>% sin 1:10 %>% trig_fest trig_fest(1:10)
Documentation reproduced from package magrittr, version 1.5, License: MIT + file LICENSE

Community examples

luciabertova.lb@gmail.com at Mar 6, 2019 magrittr v1.5

--- title: "Exploratory Analysis - Instacart" author: "Philipp Spachtholz" output: html_document: fig_height: 4 fig_width: 7 theme: cosmo --- ### Welcome and good luck to you all at Instacart Market Basket Competition! Here is a first exploratory analysis of the competition dataset. On its website Instacart has a recommendation feature, suggesting the users some items that he/she may buy again. Our task is to predict which items will be reordered on the next order. The dataset consists of information about 3.4 million grocery orders, distributed across 6 csv files. ### Read in the data ```{r message=FALSE, warning=FALSE, results='hide'} library(data.table) library(dplyr) library(ggplot2) library(knitr) library(stringr) library(DT) orders <- fread('../input/orders.csv') products <- fread('../input/products.csv') order_products <- fread('../input/order_products__train.csv') order_products_prior <- fread('../input/order_products__prior.csv') aisles <- fread('../input/aisles.csv') departments <- fread('../input/departments.csv') ``` ```{r include=FALSE} options(tibble.width = Inf) ``` Lets first have a look at these files: ### Peek at the dataset {.tabset} #### orders This file gives a list of all orders we have in the dataset. 1 row per order. For example, we can see that user 1 has 11 orders, 1 of which is in the train set, and 10 of which are prior orders. The orders.csv doesn't tell us about which products were ordered. This is contained in the order_products.csv ```{r, result='asis'} kable(head(orders,12)) glimpse(orders) ``` #### order_products_train This file gives us information about which products (product_id) were ordered. It also contains information of the order (add_to_cart_order) in which the products were put into the cart and information of whether this product is a re-order(1) or not(0). For example, we see below that order_id 1 had 8 products, 4 of which are reorders. Still we don't know what these products are. This information is in the products.csv ```{r} kable(head(order_products,10)) glimpse(order_products) ``` #### products This file contains the names of the products with their corresponding product_id. Furthermore the aisle and deparment are included. ```{r} kable(head(products,10)) glimpse(products) ``` #### order_products_prior This file is structurally the same as the other_products_train.csv. ```{r, result='asis'} kable(head(order_products_prior,10)) glimpse(order_products_prior) ``` #### aisles This file contains the different aisles. ```{r, result='asis'} kable(head(aisles,10)) glimpse(aisles) ``` #### departments ```{r, result='asis'} kable(head(departments,10)) glimpse(departments) ``` ### Recode variables We should do some recoding and convert character variables to factors. ```{r message=FALSE, warning=FALSE} orders <- orders %>% mutate(order_hour_of_day = as.numeric(order_hour_of_day), eval_set = as.factor(eval_set)) products <- products %>% mutate(product_name = as.factor(product_name)) aisles <- aisles %>% mutate(aisle = as.factor(aisle)) departments <- departments %>% mutate(department = as.factor(department)) ``` ### When do people order? Let's have a look when people buy groceries online. #### Hour of Day There is a clear effect of hour of day on order volume. Most orders are between 8.00-18.00 ```{r warning=FALSE} orders %>% ggplot(aes(x=order_hour_of_day)) + geom_histogram(stat="count",fill="red") ``` #### Day of Week There is a clear effect of day of the week. Most orders are on days 0 and 1. Unfortunately there is no info regarding which values represent which day, but one would assume that this is the weekend. ```{r warning=FALSE} orders %>% ggplot(aes(x=order_dow)) + geom_histogram(stat="count",fill="red") ``` ### When do they order again? People seem to order more often after exactly 1 week. ```{r warning=FALSE} orders %>% ggplot(aes(x=days_since_prior_order)) + geom_histogram(stat="count",fill="red") ``` ### How many prior orders are there? We can see that there are always at least 3 prior orders. ```{r} orders %>% filter(eval_set=="prior") %>% count(order_number) %>% ggplot(aes(order_number,n)) + geom_line(color="red", size=1)+geom_point(size=2, color="red") ``` ### How many items do people buy? {.tabset} Let's have a look how many items are in the orders. We can see that people most often order around 5 items. The distributions are comparable between the train and prior order set. #### Train set ```{r warning=FALSE} order_products %>% group_by(order_id) %>% summarize(n_items = last(add_to_cart_order)) %>% ggplot(aes(x=n_items))+ geom_histogram(stat="count",fill="red") + geom_rug()+ coord_cartesian(xlim=c(0,80)) ``` #### Prior orders set ```{r warning=FALSE} order_products_prior %>% group_by(order_id) %>% summarize(n_items = last(add_to_cart_order)) %>% ggplot(aes(x=n_items))+ geom_histogram(stat="count",fill="red") + geom_rug() + coord_cartesian(xlim=c(0,80)) ``` ### Bestsellers Let's have a look which products are sold most often (top10). And the clear winner is: **Bananas** ```{r fig.height=5.5} tmp <- order_products %>% group_by(product_id) %>% summarize(count = n()) %>% top_n(10, wt = count) %>% left_join(select(products,product_id,product_name),by="product_id") %>% arrange(desc(count)) kable(tmp) tmp %>% ggplot(aes(x=reorder(product_name,-count), y=count))+ geom_bar(stat="identity",fill="red")+ theme(axis.text.x=element_text(angle=90, hjust=1),axis.title.x = element_blank()) ``` ### How often do people order the same items again? 59% of the ordered items are reorders. ```{r warning=FALSE, fig.width=4} tmp <- order_products %>% group_by(reordered) %>% summarize(count = n()) %>% mutate(reordered = as.factor(reordered)) %>% mutate(proportion = count/sum(count)) kable(tmp) tmp %>% ggplot(aes(x=reordered,y=count,fill=reordered))+ geom_bar(stat="identity") ``` ### Most often reordered Now here it becomes really interesting. These 10 products have the highest probability of being reordered. ```{r warning=FALSE, fig.height=5.5} tmp <-order_products %>% group_by(product_id) %>% summarize(proportion_reordered = mean(reordered), n=n()) %>% filter(n>40) %>% top_n(10,wt=proportion_reordered) %>% arrange(desc(proportion_reordered)) %>% left_join(products,by="product_id") kable(tmp) tmp %>% ggplot(aes(x=reorder(product_name,-proportion_reordered), y=proportion_reordered))+ geom_bar(stat="identity",fill="red")+ theme(axis.text.x=element_text(angle=90, hjust=1),axis.title.x = element_blank())+coord_cartesian(ylim=c(0.85,0.95)) ``` ### Which item do people put into the cart first? People seem to be quite certain about Multifold Towels and if they buy them, put them into their cart first in 66% of the time. ```{r message=FALSE, fig.height=5.5} tmp <- order_products %>% group_by(product_id, add_to_cart_order) %>% summarize(count = n()) %>% mutate(pct=count/sum(count)) %>% filter(add_to_cart_order == 1, count>10) %>% arrange(desc(pct)) %>% left_join(products,by="product_id") %>% select(product_name, pct, count) %>% ungroup() %>% top_n(10, wt=pct) kable(tmp) tmp %>% ggplot(aes(x=reorder(product_name,-pct), y=pct))+ geom_bar(stat="identity",fill="red")+ theme(axis.text.x=element_text(angle=90, hjust=1),axis.title.x = element_blank())+coord_cartesian(ylim=c(0.4,0.7)) ``` ### Association between time of last order and probability of reorder This is interesting: We can see that if people order again on the same day, they order the same product more often. Whereas when 30 days have passed, they tend to try out new things in their order. ```{r} order_products %>% left_join(orders,by="order_id") %>% group_by(days_since_prior_order) %>% summarize(mean_reorder = mean(reordered)) %>% ggplot(aes(x=days_since_prior_order,y=mean_reorder))+ geom_bar(stat="identity",fill="red") ``` ### Association between number of orders and probability of reordering Products with a high number of orders are naturally more likely to be reordered. However, there seems to be a ceiling effect. ```{r message=FALSE} order_products %>% group_by(product_id) %>% summarize(proportion_reordered = mean(reordered), n=n()) %>% ggplot(aes(x=n,y=proportion_reordered))+ geom_point()+ geom_smooth(color="red")+ coord_cartesian(xlim=c(0,2000)) ``` ### Organic vs Non-organic What is the percentage of orders that are organic vs. not organic? ```{r fig.width=4} products <- products %>% mutate(organic=ifelse(str_detect(str_to_lower(products$product_name),'organic'),"organic","not organic"), organic= as.factor(organic)) tmp <- order_products %>% left_join(products, by="product_id") %>% group_by(organic) %>% summarize(count = n()) %>% mutate(proportion = count/sum(count)) kable(tmp) tmp %>% ggplot(aes(x=organic,y=count, fill=organic))+ geom_bar(stat="identity") ``` ### Reordering Organic vs Non-Organic People more often reorder organic products vs non-organic products. ```{r fig.width=4} tmp <- order_products %>% left_join(products,by="product_id") %>% group_by(organic) %>% summarize(mean_reordered = mean(reordered)) kable(tmp) tmp %>% ggplot(aes(x=organic,fill=organic,y=mean_reordered))+geom_bar(stat="identity") ``` ### Visualizing the Product Portfolio Here is use to treemap package to visualize the structure of instacarts product portfolio. In total there are 21 departments containing 134 aisles. ```{r} library(treemap) tmp <- products %>% group_by(department_id, aisle_id) %>% summarize(n=n()) tmp <- tmp %>% left_join(departments,by="department_id") tmp <- tmp %>% left_join(aisles,by="aisle_id") tmp2<-order_products %>% group_by(product_id) %>% summarize(count=n()) %>% left_join(products,by="product_id") %>% ungroup() %>% group_by(department_id,aisle_id) %>% summarize(sumcount = sum(count)) %>% left_join(tmp, by = c("department_id", "aisle_id")) %>% mutate(onesize = 1) ``` #### How are aisles organized within departments? ```{r, fig.width=9, fig.height=6} treemap(tmp2,index=c("department","aisle"),vSize="onesize",vColor="department",palette="Set3",title="",sortID="-sumcount", border.col="#FFFFFF",type="categorical", fontsize.legend = 0,bg.labels = "#FFFFFF") ``` #### How many unique products are offered in each department/aisle? The size of the boxes shows the number of products in each category. ```{r, fig.width=9, fig.height=6} treemap(tmp,index=c("department","aisle"),vSize="n",title="",palette="Set3",border.col="#FFFFFF") ``` #### How often are products from the department/aisle sold? The size of the boxes shows the number of sales. ```{r, fig.width=9, fig.height=6} treemap(tmp2,index=c("department","aisle"),vSize="sumcount",title="",palette="Set3",border.col="#FFFFFF") ``` ### Exploring Customer Habits Here i look for customers who just reorder the same products again all the time. To search those I look at all orders (excluding the first order), where the percentage of reordered items is exactly 1 (This can easily be adapted to look at more lenient thresholds). We can see there are in fact **3,487** customers, just always reordering products. #### Customers reordering only ```{r} tmp <- order_products_prior %>% group_by(order_id) %>% summarize(m = mean(reordered),n=n()) %>% right_join(filter(orders,order_number>2), by="order_id") tmp2 <- tmp %>% filter(eval_set =="prior") %>% group_by(user_id) %>% summarize(n_equal = sum(m==1,na.rm=T), percent_equal = n_equal/n()) %>% filter(percent_equal == 1) %>% arrange(desc(n_equal)) datatable(tmp2, class="table-condensed", style="bootstrap", options = list(dom = 'tp')) ``` #### The customer with the strongest habit The coolest customer is id #99753, having 97 orders with only reordered items. That's what I call a strong habit. She/he seems to like Organic Milk :-) ```{r warning=FALSE} uniqueorders <- filter(tmp, user_id == 99753)$order_id tmp <- order_products_prior %>% filter(order_id %in% uniqueorders) %>% left_join(products, by="product_id") datatable(select(tmp,-aisle_id,-department_id,-organic), style="bootstrap", class="table-condensed", options = list(dom = 'tp')) ``` <br> Let's look at his order in the train set. One would assume that he would buy "Organic Whole Milk" and "Organic Reduced Fat Milk": ```{r warning=FALSE} tmp <- orders %>% filter(user_id==99753, eval_set == "train") tmp2 <- order_products %>% filter(order_id == tmp$order_id) %>% left_join(products, by="product_id") datatable(select(tmp2,-aisle_id,-department_id,-organic), style="bootstrap", class="table-condensed", options = list(dom = 't')) ``` **Tadaaaa. Prediction 100% correct.** <br><br> **Thank you all for the nice comments and upvotes. You are great.**