A dataset containing credit card transactions for illustrating fraud detection and class imbalance in binary classification. The data include anonymized predictors derived from a principal component analysis, together with transaction time, transaction amount, and a binary fraud indicator.
data(creditcard_fraud)A data frame with 10000 observations and 31 variables:
Seconds elapsed between each transaction and the first transaction in the dataset.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Anonymized predictor obtained from a PCA transformation of the original variables.
Transaction amount.
Fraud indicator: 0 for non-fraudulent transactions and 1 for fraudulent transactions.
This dataset is a teaching subset derived from the original Credit Card Fraud Detection dataset available on Kaggle. The original dataset is highly imbalanced. For inclusion in the liver package, we created a smaller subset with 10000 observations that retains all fraud cases and a random sample of non-fraud cases. This version is intended for illustrating class imbalance, resampling strategies, and model evaluation in binary classification.
Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi (2015). Calibrating Probability with Undersampling for Unbalanced Classification. In 2015 IEEE Symposium Series on Computational Intelligence.
Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app.
mortgage,
bank,
churn_mlc,
churn,
churn_tel,
adult,
cereal,
advertising,
marketing,
drug,
house,
house_price,
red_wines,
white_wines,
insurance,
caravan,
loan
data(creditcard_fraud)
str(creditcard_fraud)
table(creditcard_fraud$Class)
Run the code above in your browser using DataLab