Extracts content and metadata from local documents or websites. Supports PDF, DOCX, PPTX, TXT, HTML files and performs BFS web crawling up to the specified depth.
fetch_data(local_paths = NULL, website_urls = NULL, crawl_depth = NULL)A data frame with the following columns: source, title, author, publishedDate, description, content, url, source_type.
A character vector of file paths or directories to scan for documents.
A character vector of website URLs to crawl and extract text from.
Integer indicating BFS crawl depth; set to NULL for infinite crawl.