Node Details

  • Name: FireCrawl
  • Type: Document
  • Category: Document Loaders
  • Version: 1.0

Parameters

  1. Text Splitter (optional)

    • Type: TextSplitter
    • Description: A text splitter to process the loaded documents
  2. URLs

    • Type: string
    • Description: URL to be crawled/scraped
  3. Crawler type

    • Type: options
    • Options:
      • Crawl: Crawl a URL and all accessible subpages
      • Scrape: Scrape a URL and get its content
    • Default: Crawl
  4. Max Crawl Pages (implied from the code)

    • Type: string
    • Description: Maximum number of pages to crawl
  5. Generate Image Alt Text (implied from the code)

    • Type: boolean
    • Description: Whether to generate alternative text for images
  6. Return Only URLs (implied from the code)

    • Type: boolean
    • Description: Whether to return only URLs without content
  7. Only Main Content (implied from the code)

    • Type: boolean
    • Description: Whether to extract only the main content of the page
  8. URL Patterns Excludes (implied from the code)

    • Type: string
    • Description: Comma-separated list of URL patterns to exclude from crawling
  9. URL Patterns Includes (implied from the code)

    • Type: string
    • Description: Comma-separated list of URL patterns to include in crawling
  10. Metadata (optional, implied from the code)

    • Type: string or object
    • Description: Additional metadata to add to the documents

Credentials

  • FireCrawl API
    • Type: credential
    • Credential Names: fireCrawlApi

Input

The node takes various configuration parameters as input, including the URL to crawl/scrape, crawler options, and API credentials.

Output

The node outputs an array of Document objects. Each Document contains:

  • pageContent: The content of the web page (in Markdown format)
  • metadata: Associated metadata for the document

Functionality

  1. The node initializes a FireCrawlLoader with the provided parameters.
  2. It then uses the loader to either crawl or scrape the specified URL(s).
  3. The resulting data is converted into Document objects.
  4. If a text splitter is provided, the documents are split accordingly.
  5. Additional metadata can be added to the documents if specified.

Use Cases

  • Web scraping for content analysis
  • Building training datasets from web content
  • Creating knowledge bases from websites
  • Automating data collection for research or business intelligence

Notes

  • The FireCrawl API key is required and should be set up in the credentials.
  • The node supports both crawling (multiple pages) and scraping (single page) modes.
  • Various options allow for customization of the crawling/scraping process, such as limiting the number of pages, including/excluding URL patterns, and focusing on main content.