Learn R Programming

textpress (version 1.0.0)

extract_date: Extract Date from HTML Content

Description

This function attempts to extract a publication date from the HTML content of a web page using various methods such as JSON-LD, OpenGraph meta tags, standard meta tags, and common HTML elements.

Usage

extract_date(site)

Value

A data.frame with two columns: `date` and `source`, indicating the extracted date and the source from which it was extracted (e.g., JSON-LD, OpenGraph, etc.). If no date is found, returns NA for both fields.

Arguments

site

An HTML document (as parsed by xml2 or rvest) from which to extract the date.