Usage
newsflow.compare(dtm, meta, id.var = "document_id", date.var = "date", hour.window = c(-24, 24), measure = "cosine", min.similarity = 0, n.topsim = NULL, only.from = NULL, only.to = NULL, return.zeros = FALSE, only.complete.window = TRUE)
Arguments
meta
A data.frame where rows are documents and columns are document meta information.
Should at least contain 2 columns: the document name/id and date.
The name/id column should match the document names/ids of the edgelist, and its label is specified in the `id.var` argument.
The date column should be intepretable with as.POSIXct, and its label is specified in the `date.var` argument. id.var
The label for the document name/id column in the `meta` data.frame. Default is "document_id"
date.var
The label for the document date column in the `meta` data.frame . default is "date"
hour.window
A vector of length 2, in which the first and second value determine the left and right side of the window, respectively. For example, c(-10, 36) will compare each document to all documents between the previous 10 and the next 36 hours.
measure
the measure that should be used to calculate similarity/distance/adjacency. Currently supports the symmetrical measure "cosine" (cosine similarity), and the assymetrical measures "overlap_pct" (percentage of term scores in the document that also occur in the other document).
min.similarity
a threshold for similarity. lower values are deleted. Set to 0.1 by default.
n.topsim
An alternative or additional sort of threshold for similarity. Only keep the [n.topsim] highest similarity scores for x. Can return more than [n.topsim] similarity scores in the case of duplicate similarities.
only.from
A vector with names/ids of documents (dtm rownames), or a logical vector that matches the rows of the dtm. Use to compare only these documents to other documents.
only.to
A vector with names/ids of documents (dtm rownames), or a logical vector that matches the rows of the dtm. Use to compare other documents to only these documents.
return.zeros
If true, all comparison results are returned, including those with zero similarity (rarely usefull and problematic with large data)
only.complete.window
if True, only compare articles (x) of which a full window of reference articles (y) is available. Thus, for the first and last [window.size] days, there will be no results for x.