Learn R Programming

text2emotion (version 0.1.0)

preprocess_text: Preprocess Text with Slang Handling

Description

This function performs multi-stage text preprocessing, including lowercasing, HTML cleaning, punctuation normalization, contraction expansion, internet slang replacement, emoticon replacement, and final standardization.

Usage

preprocess_text(text, use_textclean = TRUE, custom_slang = NULL)

Value

A character vector of cleaned and normalized text.

Arguments

text

A character vector of input texts.

use_textclean

Logical. Whether to use textclean for internet slang and emoticon replacement. Default is TRUE.

custom_slang

A named character vector providing user-defined slang mappings. Optional.

Details

The preprocessing pipeline includes:

  • Lowercasing the text.

  • Replacing HTML entities and non-ASCII characters.

  • Expanding common English contractions (e.g., "I'm" -> "I am").

  • Replacing internet slang and emoticons if use_textclean is TRUE.

  • Handling additional slang defined by the user.

  • Normalizing repeated punctuations and whitespace.

Examples

Run this code
preprocess_text("I'm feeling lit rn!!!")
preprocess_text("I can't believe it... lol :)", use_textclean = TRUE)

Run the code above in your browser using DataLab