costring: Sentence Comparison

Description

Computes cosine values between sentences and/or documents

Usage

costring(x,y,tvectors=tvectors,split=" ",remove.punctuation=TRUE,breakdown=FALSE)

Arguments

a character vector

tvectors

the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

split

a character vector defining the character used to split the documents into words (white space by default)

remove.punctuation

removes punctuation from x and y; TRUE by default

breakdown

if TRUE, the function breakdown is applied to the input

Value

A numeric giving the cosine between the input documents/sentences

Details

In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as $$D = \sum\limits_{i=1}^n t_n$$

This function computes the cosine between two documents (or sentences) or the cosine between a single word and a document (or sentence). The format of x (or y) can be of the kind x <- "word1 word2 word3" , but also of the kind x <- c("word1", "word2", "word3"). This allows for simple copy&paste-inserting of text, but also for using character vectors, e.g. the output of neighbors(). To import a document Document.txt to from a directory for comparisons, set your working directory to this directory using setwd(). Then use the following command lines: fileName1 <- "Alice_in_Wonderland.txt" x <- readChar(fileName1, file.info(fileName1)$size)

References

Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.

Dennis, S. (2007). How to use the LSA Web Site. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 35-56). Mahwah, NJ: Erlbaum.

http://lsa.colorado.edu/

Examples

Run this code

# NOT RUN {
data(wonderland)
costring("Alice was beginning to get very tired.",
      "A white rabbit with a clock ran close to her.",
      tvectors=wonderland)
# }

Run the code above in your browser using DataLab