Node Details

  • Name: Gitbook_DocumentLoaders
  • Type: Document
  • Category: Document Loaders
  • Version: 1.0

Input Parameters

  1. Web Path (required)

  2. Should Load All Paths (optional)

    • Type: boolean
    • Description: When set to true, the loader will recursively load all pages linked from the provided root URL.
  3. Text Splitter (optional)

    • Type: TextSplitter
    • Description: A text splitter instance to chunk the loaded text into smaller documents.
  4. Additional Metadata (optional)

    • Type: JSON
    • Description: Additional metadata to be added to all extracted documents.
  5. Omit Metadata Keys (optional)

    • Type: string
    • Description: A comma-separated list of metadata keys to omit from the final documents. Use ’*’ to omit all default metadata keys.

Output

The node outputs an array of IDocument objects. Each document contains:

  • Text content extracted from the GitBook page(s)
  • Metadata, which may include:
    • Default metadata from the GitBook loader
    • Additional metadata specified in the input
    • Modified metadata based on the omit keys setting

Functionality

  1. Initializes a GitbookLoader with the provided web path and settings.
  2. Loads documents from the specified GitBook page(s).
  3. If a text splitter is provided, splits the loaded documents.
  4. Processes metadata for each document:
    • Adds any additional metadata specified in the input.
    • Omits metadata keys as specified in the input.
  5. Returns the processed array of documents.

Use Cases

  • Extracting content from GitBook documentation for further processing or analysis.
  • Creating a searchable knowledge base from GitBook content.
  • Preparing GitBook content for use in language models or other AI applications.

Notes

  • The node uses the @langchain/community library for the GitbookLoader implementation.
  • It supports flexible metadata handling, allowing users to customize the metadata associated with each extracted document.
  • The text splitting feature enables users to break down large documents into more manageable chunks, which can be useful for certain NLP tasks or when working with models that have input size limitations.