powered by
This function is used to scrape text paragraphs from a website.
paragraphs_scrap( link, contain = NULL, case_sensitive = FALSE, collapse = FALSE, askRobot = FALSE )
the link of the web page to scrape
filter the paragraphs according to the character string provided.
logical. Should the contain argument be case sensitive ? defaults to FALSE
if TRUE the paragraphs will be collapsed into one element and the contain argument ignored.
logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.
a character vector.
# NOT RUN { # Extracting the paragraphs displayed on the health page of the New York Times link <- "https://www.nytimes.com/section/health" paragraphs_scrap(link) # } # NOT RUN { # }
Run the code above in your browser using DataLab