
Node Details
- Name: FireCrawl
- Type: Document
- Category: Document Loaders
- Version: 1.0
Parameters
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter to process the loaded documents
-
URLs
- Type: string
- Description: URL to be crawled/scraped
-
Crawler type
- Type: options
- Options:
- Crawl: Crawl a URL and all accessible subpages
- Scrape: Scrape a URL and get its content
- Default: Crawl
-
Max Crawl Pages (implied from the code)
- Type: string
- Description: Maximum number of pages to crawl
-
Generate Image Alt Text (implied from the code)
- Type: boolean
- Description: Whether to generate alternative text for images
-
Return Only URLs (implied from the code)
- Type: boolean
- Description: Whether to return only URLs without content
-
Only Main Content (implied from the code)
- Type: boolean
- Description: Whether to extract only the main content of the page
-
URL Patterns Excludes (implied from the code)
- Type: string
- Description: Comma-separated list of URL patterns to exclude from crawling
-
URL Patterns Includes (implied from the code)
- Type: string
- Description: Comma-separated list of URL patterns to include in crawling
-
Metadata (optional, implied from the code)
- Type: string or object
- Description: Additional metadata to add to the documents
Credentials
- FireCrawl API
- Type: credential
- Credential Names: fireCrawlApi
Input
The node takes various configuration parameters as input, including the URL to crawl/scrape, crawler options, and API credentials.Output
The node outputs an array of Document objects. Each Document contains:pageContent
: The content of the web page (in Markdown format)metadata
: Associated metadata for the document
Functionality
- The node initializes a FireCrawlLoader with the provided parameters.
- It then uses the loader to either crawl or scrape the specified URL(s).
- The resulting data is converted into Document objects.
- If a text splitter is provided, the documents are split accordingly.
- Additional metadata can be added to the documents if specified.
Use Cases
- Web scraping for content analysis
- Building training datasets from web content
- Creating knowledge bases from websites
- Automating data collection for research or business intelligence
Notes
- The FireCrawl API key is required and should be set up in the credentials.
- The node supports both crawling (multiple pages) and scraping (single page) modes.
- Various options allow for customization of the crawling/scraping process, such as limiting the number of pages, including/excluding URL patterns, and focusing on main content.