Node Details

  • Name: Github_DocumentLoaders
  • Type: Document
  • Version: 2.0
  • Category: Document Loaders

Parameters

Credential (Optional)

  • Label: Connect Credential
  • Name: credential
  • Type: credential
  • Description: Only needed when accessing a private repo
  • Credential Names: githubApi

Inputs

  1. Repo Link (required)

  2. Branch (required)

    • Type: string
    • Default: “main”
    • Description: The branch of the repository to load from
  3. Recursive (optional)

    • Type: boolean
    • Description: Whether to recursively traverse the repository
  4. Max Concurrency (optional)

    • Type: number
    • Description: Maximum number of concurrent operations
  5. Ignore Paths (optional)

    • Type: string (JSON array)
    • Description: An array of paths to be ignored
    • Example: [“*.md”]
  6. Max Retries (optional)

    • Type: number
    • Description: Maximum number of retries for a single call
    • Default: 2
  7. Text Splitter (optional)

    • Type: TextSplitter
    • Description: A text splitter to apply to the loaded documents
  8. Additional Metadata (optional)

    • Type: JSON
    • Description: Additional metadata to be added to the extracted documents
  9. Omit Metadata Keys (optional)

    • Type: string
    • Description: Comma-separated list of metadata keys to omit from the documents

Functionality

  1. Initializes a GithubRepoLoader with the provided repository link and options.
  2. Loads documents from the specified GitHub repository.
  3. Applies text splitting if a TextSplitter is provided.
  4. Adds custom metadata to the documents if specified.
  5. Omits specified metadata keys from the documents.

Output

An array of IDocument objects representing the loaded and processed documents from the GitHub repository.

Use Cases

  • Loading source code for analysis or processing
  • Fetching documentation or README files from GitHub projects
  • Integrating GitHub-hosted content into document processing workflows
  • Analyzing changes across different branches of a repository

Notes

  • When accessing private repositories, make sure to provide the appropriate GitHub API credentials.
  • The node supports various options for customizing the loading process, including recursive traversal, concurrency control, and retry logic.
  • Use the text splitter option to break down large documents into smaller chunks if needed.
  • The additional metadata and metadata key omission features allow for fine-grained control over the document metadata.