Learn R Programming

RAGFlowChainR (version 0.1.1)

fetch_data: Fetch Data from Local Files and Websites

Description

Extracts content and metadata from local documents or websites. Supports PDF, DOCX, PPTX, TXT, HTML files and performs BFS web crawling up to the specified depth.

Usage

fetch_data(local_paths = NULL, website_urls = NULL, crawl_depth = NULL)

Value

A data frame with the following columns: source, title, author, publishedDate, description, content, url, source_type.

Arguments

local_paths

A character vector of file paths or directories to scan for documents.

website_urls

A character vector of website URLs to crawl and extract text from.

crawl_depth

Integer indicating BFS crawl depth; set to NULL for infinite crawl.