Learn R Programming

regclass (version 1.5)

JUNK: Junk-mail dataset

Description

Building a junk mail classifier based on word and character frequencies

Usage

data("JUNK")

Arguments

Format

A data frame with 4601 observations on the following 58 variables.

Source

Adapted from the Spambase Data Set at the UCI data repository https://archive.ics.uci.edu/ml/datasets/Spambase. Creators: Mark Hopkins, Erik Reeber, George Forman, Jaap Suermondt; Hewlett-Packard Labs, 1501 Page Mill Rd., Palo Alto, CA 94304. Donor: George Forman (gforman at nospam hpl.hp.com)

Details

The collection of junk emails came from the postmaster and individuals who classified the email as junk. The collection of safe emails were from work and personal emails. Note that most of the variables are percents and can vary from 0-100, though most values are much less than 1 (1%).