extract_date {textpress} | R Documentation |
Extract Date from HTML Content
Description
This function attempts to extract a publication date from the HTML content of a web page using various methods such as JSON-LD, OpenGraph meta tags, standard meta tags, and common HTML elements.
Usage
extract_date(site)
Arguments
site |
An HTML document (as parsed by xml2 or rvest) from which to extract the date. |
Value
A data.frame with two columns: 'date' and 'source', indicating the extracted date and the source from which it was extracted (e.g., JSON-LD, OpenGraph, etc.). If no date is found, returns NA for both fields.
[Package textpress version 1.0.0 Index]