A dataset containing session-level information from an e-commerce website, including page visit counts, time spent in different page categories, Google Analytics metrics, visitor characteristics, and a binary outcome indicating whether the session ended in a purchase. The dataset can be used to illustrate binary classification, exploratory data analysis, model comparison, and supervised learning methods in R.
data(purchase_intention)A data frame with 12330 observations and 18 variables:
Number of administrative pages visited during the session.
Total time spent on administrative pages during the session.
Number of informational pages visited during the session.
Total time spent on informational pages during the session.
Number of product-related pages visited during the session.
Total time spent on product-related pages during the session.
Average bounce rate associated with the visited pages.
Average exit rate associated with the visited pages.
Average page value for pages visited before a completed transaction.
Closeness of the session date to a special shopping day, scaled between 0 and 1.
Month of the session.
Visitor operating system, recorded as a categorical factor.
Visitor browser, recorded as a categorical factor.
Visitor region, recorded as a categorical factor.
Traffic source type, recorded as a categorical factor.
Visitor type: "New_Visitor", "Returning_Visitor", or "Other".
Whether the session occurred on a weekend: "no" or "yes".
Whether the session ended in a purchase: "no" or "yes".
This dataset was obtained from the UCI Machine Learning Repository and renamed
purchase_intention for inclusion in the liver package. It contains
session-level records from an online shopping website and is well suited for
illustrating modern binary classification problems in which the goal is to
predict whether a browsing session will end in a purchase.
The predictors combine behavioral measures such as page visit counts and time
spent on different types of pages with summary metrics such as
bounce_rates, exit_rates, and page_values, as well as
visitor and session characteristics including month,
visitor_type, traffic_type, and weekend. The outcome
variable revenue indicates whether the session resulted in a completed
transaction.
The dataset is particularly useful for demonstrating classification workflows such as partitioning data into training and test sets, fitting logistic regression, Naive Bayes, k-nearest neighbors, and tree-based models, and evaluating predictive performance using confusion matrices, ROC curves, and AUC.
Sakar, C. O., Polat, S. O., Katircioglu, M., and Kastro, Y. (2019). Real-time prediction of online shoppers' purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Computing and Applications, 31, 6893--6908. tools:::Rd_expr_doi("10.1007/s00521-018-3523-0")
Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app
mortgage,
bank,
churn_mlc,
churn,
churn_tel,
adult,
cereal,
advertising,
marketing,
drug,
house,
house_price,
red_wines,
white_wines,
insurance,
caravan,
loan
data(purchase_intention)
str(purchase_intention)
summary(purchase_intention)
Run the code above in your browser using DataLab