Document loaders
Github Document Loader
The Github Document Loader is a node that allows you to load data from a GitHub repository. It can be used to fetch and process files from both public and private repositories, making it a versatile tool for integrating GitHub content into your document processing pipeline.
Node Details
- Name: Github_DocumentLoaders
- Type: Document
- Version: 2.0
- Category: Document Loaders
Parameters
Credential (Optional)
- Label: Connect Credential
- Name: credential
- Type: credential
- Description: Only needed when accessing a private repo
- Credential Names: githubApi
Inputs
-
Repo Link (required)
- Type: string
- Description: The URL of the GitHub repository
- Example: https://github.com/Ardor_Cerebrum/Ardor
-
Branch (required)
- Type: string
- Default: “main”
- Description: The branch of the repository to load from
-
Recursive (optional)
- Type: boolean
- Description: Whether to recursively traverse the repository
-
Max Concurrency (optional)
- Type: number
- Description: Maximum number of concurrent operations
-
Ignore Paths (optional)
- Type: string (JSON array)
- Description: An array of paths to be ignored
- Example: [“*.md”]
-
Max Retries (optional)
- Type: number
- Description: Maximum number of retries for a single call
- Default: 2
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter to apply to the loaded documents
-
Additional Metadata (optional)
- Type: JSON
- Description: Additional metadata to be added to the extracted documents
-
Omit Metadata Keys (optional)
- Type: string
- Description: Comma-separated list of metadata keys to omit from the documents
Functionality
- Initializes a GithubRepoLoader with the provided repository link and options.
- Loads documents from the specified GitHub repository.
- Applies text splitting if a TextSplitter is provided.
- Adds custom metadata to the documents if specified.
- Omits specified metadata keys from the documents.
Output
An array of IDocument objects representing the loaded and processed documents from the GitHub repository.
Use Cases
- Loading source code for analysis or processing
- Fetching documentation or README files from GitHub projects
- Integrating GitHub-hosted content into document processing workflows
- Analyzing changes across different branches of a repository
Notes
- When accessing private repositories, make sure to provide the appropriate GitHub API credentials.
- The node supports various options for customizing the loading process, including recursive traversal, concurrency control, and retry logic.
- Use the text splitter option to break down large documents into smaller chunks if needed.
- The additional metadata and metadata key omission features allow for fine-grained control over the document metadata.
Was this page helpful?