Document loaders
Gitbook Document Loader
The Gitbook Document Loader is a component designed to load and process content from GitBook documentation websites. It allows users to extract text content from GitBook pages, optionally split the text into smaller chunks, and add or modify metadata associated with the extracted documents.
Node Details
- Name: Gitbook_DocumentLoaders
- Type: Document
- Category: Document Loaders
- Version: 1.0
Input Parameters
-
Web Path (required)
- Type: string
- Description: The URL of the GitBook page to load. If loading all paths, provide the root URL of the GitBook site.
- Example: https://docs.gitbook.com/product-tour/navigation
-
Should Load All Paths (optional)
- Type: boolean
- Description: When set to true, the loader will recursively load all pages linked from the provided root URL.
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter instance to chunk the loaded text into smaller documents.
-
Additional Metadata (optional)
- Type: JSON
- Description: Additional metadata to be added to all extracted documents.
-
Omit Metadata Keys (optional)
- Type: string
- Description: A comma-separated list of metadata keys to omit from the final documents. Use ’*’ to omit all default metadata keys.
Output
The node outputs an array of IDocument objects. Each document contains:
- Text content extracted from the GitBook page(s)
- Metadata, which may include:
- Default metadata from the GitBook loader
- Additional metadata specified in the input
- Modified metadata based on the omit keys setting
Functionality
- Initializes a GitbookLoader with the provided web path and settings.
- Loads documents from the specified GitBook page(s).
- If a text splitter is provided, splits the loaded documents.
- Processes metadata for each document:
- Adds any additional metadata specified in the input.
- Omits metadata keys as specified in the input.
- Returns the processed array of documents.
Use Cases
- Extracting content from GitBook documentation for further processing or analysis.
- Creating a searchable knowledge base from GitBook content.
- Preparing GitBook content for use in language models or other AI applications.
Notes
- The node uses the @langchain/community library for the GitbookLoader implementation.
- It supports flexible metadata handling, allowing users to customize the metadata associated with each extracted document.
- The text splitting feature enables users to break down large documents into more manageable chunks, which can be useful for certain NLP tasks or when working with models that have input size limitations.
Was this page helpful?
Previous
Github Document LoaderThe Github Document Loader is a node that allows you to load data from a GitHub repository. It can be used to fetch and process files from both public and private repositories, making it a versatile tool for integrating GitHub content into your document processing pipeline.
Next