This function calculates the number of references that different articles share together, as well as the coupling angle value of edges in a bibliographic coupling network sen1983biblionetwork, from a direct citation data frame. This is a standard way to build bibliographic coupling network using Salton's cosine measure: it divides the number of references that two articles share by the square root of the product of both articles bibliography lengths. It avoids giving too much importance to articles with a large bibliography.
biblio_coupling(
dt,
source,
ref,
normalized_weight_only = TRUE,
weight_threshold = 1,
output_in_character = TRUE
)
For bibliographic coupling (or co-citation), the dataframe with citing and cited documents. It could also be used
for title co-occurence network, with source
being the articles,
and ref
being the list of words in articles titles;
for co-authorship network,
with source
being the authors, and ref
the list of articles.
The column name of the source identifiers, that is the documents that are citing. In a coupling network, these documents are the nodes of the network.
The column name of the cited references identifiers.
If set to FALSE, the function returns the weights normalized by the cosine measure, but also the number of shared references.
Corresponds to the value of the non-normalized weights of edges. The function just keeps the edges
that have a non-normalized weight superior to the weight_threshold
. In other words, if you set the
parameter to 2, the function keeps only the edges between nodes that share at least two references
in common in their bibliography. In a large bibliographic coupling network,
you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together.
This parameter could also be modified to avoid creating intractable networks with too many edges.
If TRUE, the function ends by transforming the from
and to
columns in character, to make the
creation of a tidygraph network easier.
A data.table with the articles (or authors) identifiers in from
and to
columns,
with one or two additional columns (the coupling angle measure and the number of shared references).
It also keeps a copy of from
and to
in the Source
and Target
columns. This is useful is you
are using the tidygraph package after, where from
and to
values are modified when creating a graph.
This function implements the following weight measure:
$$\frac{R(A) \bullet R(B)}{\sqrt{L(A).L(B)}}$$
with \(R(A)\) and \(R(B)\) the references of document A and document B, \(R(A) \bullet R(B)\) being the number of shared references by A and B, and \(L(A)\) and \(L(B)\) the length of the bibliographies of document A and document B.
This function uses data.table package and is thus very fast. It allows the user to compute the coupling angle on a very large data frame quickly.
This function is a relatively general function that can also be used
for co-citation networks (just by inversing the source
and ref
columns).
If you want to avoid confusion, rather use the biblio_cocitation()
function.
for title co-occurence networks (taking care of the length of the title thanks to the coupling angle measure);
for co-authorship networks (taking care of the
number of co-authors an author has collaborated with on a period). For co-authorship,
rather use the coauth_network()
function.
# NOT RUN {
library(biblionetwork)
biblio_coupling(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref",
weight_threshold = 3)
# }
Run the code above in your browser using DataLab