FireCrawl Document Loader
The FireCrawl Document Loader is a node for loading web content using the FireCrawl API. It allows users to crawl or scrape web pages and convert the content into document objects that can be used in language models and other AI applications.
Node Details
- Name: FireCrawl
- Type: Document
- Category: Document Loaders
- Version: 1.0
Parameters
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter to process the loaded documents
-
URLs
- Type: string
- Description: URL to be crawled/scraped
-
Crawler type
- Type: options
- Options:
- Crawl: Crawl a URL and all accessible subpages
- Scrape: Scrape a URL and get its content
- Default: Crawl
-
Max Crawl Pages (implied from the code)
- Type: string
- Description: Maximum number of pages to crawl
-
Generate Image Alt Text (implied from the code)
- Type: boolean
- Description: Whether to generate alternative text for images
-
Return Only URLs (implied from the code)
- Type: boolean
- Description: Whether to return only URLs without content
-
Only Main Content (implied from the code)
- Type: boolean
- Description: Whether to extract only the main content of the page
-
URL Patterns Excludes (implied from the code)
- Type: string
- Description: Comma-separated list of URL patterns to exclude from crawling
-
URL Patterns Includes (implied from the code)
- Type: string
- Description: Comma-separated list of URL patterns to include in crawling
-
Metadata (optional, implied from the code)
- Type: string or object
- Description: Additional metadata to add to the documents
Credentials
- FireCrawl API
- Type: credential
- Credential Names: fireCrawlApi
Input
The node takes various configuration parameters as input, including the URL to crawl/scrape, crawler options, and API credentials.
Output
The node outputs an array of Document objects. Each Document contains:
pageContent
: The content of the web page (in Markdown format)metadata
: Associated metadata for the document
Functionality
- The node initializes a FireCrawlLoader with the provided parameters.
- It then uses the loader to either crawl or scrape the specified URL(s).
- The resulting data is converted into Document objects.
- If a text splitter is provided, the documents are split accordingly.
- Additional metadata can be added to the documents if specified.
Use Cases
- Web scraping for content analysis
- Building training datasets from web content
- Creating knowledge bases from websites
- Automating data collection for research or business intelligence
Notes
- The FireCrawl API key is required and should be set up in the credentials.
- The node supports both crawling (multiple pages) and scraping (single page) modes.
- Various options allow for customization of the crawling/scraping process, such as limiting the number of pages, including/excluding URL patterns, and focusing on main content.