Learn R Programming

malaytextr (version 0.1.3)

stem_malay: Stemming Malay words

Description

Malaytextr function to stem Malay words

Usage

stem_malay(word,
  dictionary,
  col_feature1,
  col_dict1,
  col_dict2,
  Word)

Value

Returns a data frame with the following properties:

  • Col Word: Renamed input from word

  • Root Word: An additional column which contains the word(s) after being stemmed.

Format

An object of class function of length 1.

Arguments

word

A data frame, or a character vector

dictionary

A data frame with a column of words to be stemmed and a column of root words

col_feature1

Column that contains words to be stemmed from word

col_dict1

Column that will be used to match with col_feature1 from word

col_dict2

Column that contains the root words from dictionary

Word

Depreciated. Please use word instead

Details

stem_malay() is an approach to find the Malay words in a dictionary and then proceed to remove "extra suffix" as explained by Khan et al. (2017), and then "prefix" and lastly, "suffix".

References

Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi. 2017. "Malay Language Stemmer."

Examples

Run this code

#Specifying a character vector &
#use a dictionary from malaytextr package

stem_malay(word = "banyaknya", dictionary = malayrootwords)



#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")

Run the code above in your browser using DataLab