Learn R Programming

phm (version 2.1.2)

Phrase Mining

Description

Functions to extract and handle commonly occurring principal phrases obtained from collections of texts. Major speed improvements - core functions rewritten in C++ for faster phrase-document parsing, clustering, and text distance computations. Based on, Small, E., & Cabrera, J. (2025). Principal phrase mining, an automated method for extracting meaningful phrases from text. International Journal of Computers and Applications, 47(1), 84–92.

Copy Link

Version

Install

install.packages('phm')

Monthly Downloads

262

Version

2.1.2

License

GPL-3

Maintainer

Ellie Small

Last Published

September 30th, 2025

Functions in phm (2.1.2)

stopPhrases

Phrases that are not Principal Phrases
showCluster

Show Cluster Contents
textDistMatrix

Calculate a Text Distance Matrix
textDist

Calculate Text Distance (dense version)
print.phraseDoc

Print a phraseDoc Object
textDist_sparse

Calculate Text Distance (sparse version)
bestDocs

Find Informative Documents in a Corpus
getPubMed

Create a data table from a text file in PubMed format
getPhrases

Display Frequency Matrix for Documents
getDocs

Display Frequency Matrix for Phrases
DFSource

Create a DFSource object from a data frame
getElem.DFSource

Obtain the current row of the content of a DFSource
canberra

Calculate Canberra Distance
as.matrix.phraseDoc

Convert a phraseDoc Object to a Matrix
removePhrases

Remove Phrases from phraseDoc Object
freqPhrases

Display Frequent Principal Phrases
distMatrix

Calculate a Distance Matrix
phraseDoc

phraseDoc Creation
stopEndWords

Words that Principal Phrases do not End with
readDF

Create a PlainTextDocument from a row in a data frame
print.textCluster

Print a textCluster Object
stopStartWords

Words that Principal Phrases do not Start with
textCluster

Cluster a Term-Document Matrix