Learn R Programming

quickSentiment (version 0.1.0)

pipeline: Run a Full Text Classification Pipeline on Preprocessed Text

Description

This function takes a data frame with pre-cleaned text and handles the data splitting, vectorization, model training, and evaluation.

Usage

pipeline(
  vect_method,
  model_name,
  df,
  text_column_name,
  sentiment_column_name,
  n_gram = 1,
  parallel = FALSE,
  stratify = TRUE
)

Value

A list containing the trained model object, the DFM template, class levels, and a comprehensive evaluation report.

Arguments

vect_method

A string specifying the vectorization method. Must be one of "bow", "binary", "tf", or "tfidf".

model_name

A string specifying the model to train. Must be one of "logit", "rf", or "xgb".

df

The input data frame.

text_column_name

The name of the column containing the **preprocessed** text.

sentiment_column_name

The name of the column containing the original target labels (e.g., ratings).

n_gram

The n-gram size to use for BoW/TF-IDF. Defaults to 1.

parallel

If TRUE, runs model training in parallel. Default FALSE.

stratify

If TRUE, use stratified split by sentiment. Default TRUE.

Examples

Run this code
df <- data.frame(
  text = c("good product", "excellent", "loved it", "great quality",
           "bad service", "terrible", "hated it", "awful experience",
           "not good", "very bad", "fantastic", "wonderful"),
  y = c("P", "P", "P", "P", "N", "N", "N", "N", "N", "N", "P", "P")
)
# Note: We use a small dataset here for demonstration.
# In real use cases, ensure you have more observations per class.
out <- pipeline("bow", "logit", df, "text", "y")


Run the code above in your browser using DataLab