This function calculates a refined similarity measure of coupling links, from a direct citation data frame.
It is sinpired by shen2019biblionetwork. To a certain extent, it mixes the coupling_strength()
function with
the cosine measure of the biblio_coupling()
function.
coupling_similarity(
dt,
source,
ref,
weight_threshold = 1,
output_in_character = TRUE
)
The table with citing and cited documents.
The column name of the source identifiers, that is the documents that are citing. In bibliographic coupling, these documents are the nodes of the network.
The column name of the references that are cited.
Corresponds to the value of the non-normalized weights of edges. The function just keeps the edges
that have a non-normalized weight superior to the weight_threshold
. In other words, if you set the
parameter to 2, the function keeps only the edges between nodes that share at least two references
in common in their bibliography. In a large bibliographic coupling network,
you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together.
This parameter could also be modified to avoid creating intractable networks with too many edges.
If TRUE, the function ends by transforming the from
and to
columns in character, to make the
creation of a tidygraph network easier.
A data.table with the articles identifiers in from
and to
columns, with the similarity measure in
another column. It also keeps a copy of from
and to
in the Source
and Target
columns. This is useful is you
are using the tidygraph package then, where from
and to
values are modified when creating a graph.
The function use the following formalisation:
$$\frac{R_{S}(A) \bullet R_{S}(B)}{\sqrt{R_{S}(A).R_{S}(B)}}$$
with $$R_{S}(A) \bullet R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j})}})}$$ that is a measure similar to the coupling strength measure;
and $$R_{S}(A).R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(A))}})} . \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(B))}})}$$ which is the separated sum for each article of the normalized value of a citation. It is the cosine measure of documents A and B but adapted to the spirit of the coupling strength.
# NOT RUN {
library(biblionetwork)
coupling_similarity(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref")
# }
Run the code above in your browser using DataLab