
Last chance! 50% off unlimited learning
Sale ends in
2D or 3D-Plot of mutual word similarities to a given list of sentences/documents
plot_doclist(x,connect.lines="all",method="PCA",dims=3,
axes=F,box=F,cex=1,chars=10,legend=T, size = c(800,800),
alpha="graded",alpha.grade=1,col="rainbow",
tvectors=tvectors,breakdown=FALSE,…)
a character vector of length(x) > 1
that contains multiple sentences/documents
the dimensionality of the plot; set either dims = 2
or dims = 3
the method to be applied; either a Principal Component Analysis (method="PCA"
) or a Multidimensional Scaling (method="MDS"
)
(3d plot only) the number of closest associate words each word is connected with via line. Setting connect.lines="all"
(default) will draw all connecting lines and will automatically apply alpha="graded"
(3d plot only) whether axes shall be included in the plot
(3d plot only) whether a box shall be drawn around the plot
(2d Plot only) A numerical value giving the amount by which plotting text should be magnified relative to the default.
an integer specifying how many letters (starting from the first) of each sentence/document are to be printed in the plot
(3d plot only) A numeric vector with two elements, the first specifying the width and the second specifying the height of the plot device.
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
if TRUE
, the function breakdown
is applied to the input
(3d plot only) A numeric vector specifying the luminance of the connect.lines
. By setting alpha="graded"
, the luminance of every line will be adjusted to the cosine between the two words it connects.
(3d plot only) Only relevant if alpha="graded"
. Specify a numeric value for alpha.grade
to scale the luminance of all connect.lines
up (alpha.grade
> 1) or down (alpha.grade
< 1) by that factor.
(3d plot only) A vector specifying the color of the connect.lines
. With setting col ="rainbow"
(default), the color of every line will be adjusted to the cosine between the two words it connects, according to the rainbow palette. Other available color palettes for this purpose are heat.colors
, terrain.colors
, topo.colors
, and cm.colors
(see rainbow
). Additionally, you can customize any color scale of your choice by providing an input specifying more than one color (for example col = c("black","blue","red")
).
additional arguments which will be passed to plot3d
(in a three-dimensional plot only)
see plot3d
: this function is called for the side effect of drawing the plot; a vector of object IDs is returned.
plot_doclist
further prints a list with two elements:
the coordinate vectors of the sentences/documents in the plot as a data frame
A legend for the sentence/document labels in the plot and in the coordinates
Computes all pairwise similarities within a given list of sentences/documents. On this similarity matrix, a Principal Component Analysis (PCA) or a Multidimensional Sclaing (MDS) is applied to get a two- or three-dimensional solution that best captures the similarity structure. This solution is then plotted.
In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as
x
should be of the kind x <- c("this is the first text","here is another text")
For creating pretty plots showing the similarity structure within this list of words best, set connect.lines="all"
and col="rainbow"
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate Analysis, London: Academic Press.
cosine
,
multidocs
,
plot_neighbors
,
plot_wordlist
,
plot3d
,
princomp
,
rainbow
# NOT RUN {
data(wonderland)
## Standard Plot
docs <- c("Alice was beginning to get very tired.",
"The red queen greeted Alice.",
"The mad hatter and the mare hare are having a party.",
"The hatter sliced the cup of tea in half.")
plot_doclist(docs,tvectors=wonderland,method="MDS",dims=2)
# }
Run the code above in your browser using DataLab