
Node Details
- Name: File_DocumentLoaders
- Type: Document
- Category: Document Loaders
- Version: 1.0
Input Parameters
-
File (required)
- Type: file
- Description: The file(s) to be loaded. Can accept multiple file types.
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter to chunk the loaded documents.
-
Pdf Usage (optional)
- Type: options
- Description: Specifies how to process PDF files.
- Options:
- One document per page
- One document per file
- Default: One document per page
-
JSONL Pointer Extraction (optional)
- Type: string
- Description: Specifies the pointer for extracting data from JSONL files.
-
Additional Metadata (optional)
- Type: json
- Description: Additional metadata to be added to the extracted documents.
-
Omit Metadata Keys (optional)
- Type: string
- Description: Comma-separated list of metadata keys to omit from the default set. Use ’*’ to omit all default metadata.
Supported File Types
- Text (.txt)
- JSON (.json)
- JSON Lines (.jsonl)
- CSV (.csv)
- Excel (.xls, .xlsx)
- Word (.docx, .doc)
- PDF (.pdf)
- YAML (.yaml)
Functionality
- The node accepts file inputs either as base64-encoded strings or references to files in storage.
- It determines the file type and uses the appropriate loader for each file.
- If a text splitter is provided, it splits the loaded documents.
- Additional metadata is added to each document if specified.
- Specified metadata keys are omitted if requested.
Output
The node outputs an array of Document objects, each containing:- The content of the loaded file (or a chunk of it if split)
- Metadata associated with the document
Use Cases
- Loading and preprocessing documents for language models
- Preparing data for text analysis or NLP tasks
- Batch processing of multiple files of various formats
- Customizing document metadata for specific applications
Notes
- The node uses various loaders from the LangChain library to handle different file types.
- It includes special handling for PDF files, allowing for per-page or per-file document creation.
- The node supports both local file uploads and files stored in a file storage system.