Learn R Programming

⚠️There's a newer version (0.1.14) of this package.Take me there.

wikkitidy

Tidy analysis of Wikipedia in R

What’s in a name?

wiki: There are many wikis, but one dominates the Wikiverse. Wikipedia is the largest repository of facts ever assembled by human hands. Scholars the world over are turning to Wikipedia to understand how twenty-first century society understands itself.

quiddity: The ‘whatness’ of a thing. The kind of thing it is. What is Wikipedia? Is it merely another encyclopaedia? It is news presented as history? Is it the consensus of a global village, or the battleground of an ideological war?

tidy: The best kind of data. R programmers are lucky to have access to the tidyverse, a collection of packages that make it easy to analyse, visualise and publish data. This package embodies tidy data principles by returning results from Wikipedia’s APIs as tibbles or simple vectors, and by providing a number of vectorised analysis functions that can be applied reliably and without fuss to the data you retrieve.

Thus wikkitidy’s aim: to help you work out what Wikipedia is with minimal data wrangling and cleaning.

Getting to 1.0

VersionFeatureDone?
0.1Basic request objects:white_check_mark:
0.2Calls and response objects for Core and Wikimedia REST APIs:white_large_square:
0.3Calls and response objects for MediaWiki Action API Query Modules:white_large_square:
0.4Interface to Wikipedia XML dumps:white_large_square:
0.5Implementation of Wikiblame:white_large_square:
0.6Calls and response objects for the XTools and WikiMedia APIs:white_large_square:

Installation

You can install wikkitidy from CRAN with:

install.packages("wikkitidy")

You can install the development version from Github with:

devtools::install_github("wikihistories/wikkitidy")

ur ## Code of Conduct

Please note that the wikkitidy project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('wikkitidy')

Monthly Downloads

124

Version

0.1.12

License

MIT + file LICENSE

Maintainer

Michael Falk

Last Published

February 9th, 2024

Functions in wikkitidy (0.1.12)

new_prop_query

Constructor for the property query type
process_timestamps

Convert passed objects into ISO8601 strings for API requests
wikipedia_rest_apis

Build a REST request to one of Wikipedia's specific REST APIs
wikimedia_rest_apis

Build a REST request to one of the Wikimedia Foundation's central APIs
wiki_action_request

tidyeval

Tidy eval helpers
verify_xml_integrity

Check that a Wikimedia XML file has not been corrupted
query_tbl

Representation of Wikipedia data returned from an Action API Query module as tibble, with request metadata stored as attributes.
query_page_properties

Choose properties to return for pages from the action API
wikkitidy-package

wikkitidy: Tidy Analysis of Wikipedia
query_list_pages

List pages that meet certain criteria
xtools_page

wikkitidy_example

Get path to wikkitidy example
query_generate_pages

Generate pages that meet certain criteria, or which are related to a set of known pages by certain properties
get_query_results

new_generator_query

Constructor for generator query type
check_namespace

Ensure namespace arguments are valid
get_history_count

Count how many times Wikipedia articles have been edited
get_diff

Search for insertions, deletions or relocations of text between two versions of a Wikipedia page
get_rest_resource

query_by_

Query the MediaWiki Action API using a vector of Wikipedia pages
query_category_members

Explore Wikipedia's category system
page_vector_functions

Get data about pages from their titles
%>%

Pipe operator
new_list_query

perform_query

Perform a single request to the Action API.
prefix_params

Add required prefix to URL parameters for MediaWiki Action API request
parse_response.wikidiff2

Convert a response from a Wikipedia API into a convenient format
check_limit

Ensure that the limit is correct for the endpoint. Raise an error if not.
id_or_title

Determine if a page parameter comprises titles or pageids, and prefix accordingly.
append_query_result

Combine new results for a query with previously downloaded results
continue_query

Query the Action API continually until a continuation condition no longer holds.