topic_classification: Topic Classification for Narrative Data

Description

This function classifies sentences from narrative data into topics based on Latent Dirichlet Allocation (LDA) topic modeling results. It integrates additional features like sentiment analysis, annotated data, and seeded topics to enhance the classification.

Usage

topic_classification(
  data,
  narratives,
  sentences,
  sentences_lda,
  sentence_polarity,
  data_annotated,
  use_beta,
  use_seeds,
  nr_topics,
  seeded_topics,
  competencies
)

Value

A data.table containing the classified sentences with the following columns:

sentenceid: Unique identifier for each sentence.
sentence: The sentence text.
polarity: The sentiment polarity score of the sentence.
max_probability: The highest topic probability for each sentence.
Additional columns corresponding to each topic, representing the probability of the sentence belonging to that topic.
Metadata fields such as document, submissionid, competencyid, feedbacktype, score, and comment.

Arguments

data: A data frame or data.table containing metadata and other relevant data for topic classification.
narratives: A data frame or data.table containing the narrative data, including comments and feedback.
sentences: A data frame or data.table containing the sentences to be classified, including the text and metadata.
sentences_lda: A result object from LDA topic modeling, which contains the topic distribution for each term.
sentence_polarity: A data frame or data.table containing sentence-level polarity scores (sentiment analysis results).
data_annotated: A data frame or data.table containing the lemmatized text and other annotations.
use_beta: A logical value indicating whether to use the beta values from the LDA model for topic classification.
use_seeds: A logical value indicating whether to incorporate seeded topics for topic classification.
nr_topics: An integer specifying the number of topics to classify.
seeded_topics: A list of character vectors representing the topics with predefined seed words.
competencies: A list of mames of the competencies per id as used in the data set

Details

The topic_classification function integrates several sources of data to classify sentences into topics:

It uses LDA topic modeling results to assign a probability for each topic.
The function can incorporate seeded topics, which enhance the classification by matching predefined keywords to the topics.
Sentiment analysis is used to add polarity scores to the sentence data.
The output includes additional metadata for each sentence, such as feedback type, score, and associated comments.