Learn R Programming

RNewsflow: Tools for analyzing content homogeneity and news diffusion using computational text analysis

Given the sheer amount of news sources in the digital age (e.g., newspapers, blogs, social media) it has become difficult to determine where news is first introduced and how it diffuses across sources. RNewsflow provides tools for analyzing content homogeneity and diffusion patterns using computational text analysis. The content of news messages is compared using techniques from the field of information retrieval, similar to plagiarism detection. By using a sliding window approach to only compare messages within a given time distance, many sources can be compared over long periods of time. Furthermore, the package introduces an approach for analyzing the news similarity data as a network, and includes various functions to analyze and visualize this network.

Installation

You can install the development version of RNewsflow directly from github:

library(devtools)
install_github("kasperwelbers/RNewsflow")

Vignette

The vignette containing a step-by-step tutorial for using RNewsflow can be called from within R.

library(RNewsflow)
vignette('RNewsflow')

Copy Link

Version

Install

install.packages('RNewsflow')

Monthly Downloads

1,346

Version

1.2.8

License

GPL-3

Maintainer

Kasper Welbers

Last Published

April 3rd, 2024

Functions in RNewsflow (1.2.8)

rnewsflow_dfm

quanteda dfm for RNewsflow vignette demo
term_union

Combine terms in a dtm
term_intersect

Combine terms in a dtm
term_innovation

Experimental: Convert dtm scores to a term innovation score, based on changes in term use over time
docnet

Document similarity network for one news agency, and the print and online editions of two newspapers
as_document_network

Create a document similarity network
document_network_plot

Visualize (a subcomponent) of the document similarity network
delete_duplicates

Delete duplicate (or similar) documents from a document term matrix
directed_network_plot

A wrapper for plot.igraph for visualizing directed networks.
compare_documents

Compare the documents in a dtm
filter_window

Filter edges from the document similarity network based on hour difference
create_document_network

Create a document similarity network
get_doc_terms

View term scores for a given document
create_queries

Automatically infer queries from combinations of terms in a dtm
network_aggregate

Aggregate the edges of a network by vertex attributes
show_window

Show time window of document pairs
only_first_match

Transform document network so that each document only matches the earliest dated matching document
get_overlap_terms

View overlapping terms for a given pair of documents
term_day_dist

Calculate statistics for term occurence across days
term_char_sim

Find terms with similar spelling
tcrossprod_sparse

tcrossprod with benefits, for people that like parameters
hourdiff_range_thresholds

Inspect effects of thresholds on matches over time
newsflow_compare

Create a network of document similarities over time