Node Details

  • Name: File_DocumentLoaders
  • Type: Document
  • Category: Document Loaders
  • Version: 1.0

Input Parameters

  1. File (required)

    • Type: file
    • Description: The file(s) to be loaded. Can accept multiple file types.
  2. Text Splitter (optional)

    • Type: TextSplitter
    • Description: A text splitter to chunk the loaded documents.
  3. Pdf Usage (optional)

    • Type: options
    • Description: Specifies how to process PDF files.
    • Options:
      • One document per page
      • One document per file
    • Default: One document per page
  4. JSONL Pointer Extraction (optional)

    • Type: string
    • Description: Specifies the pointer for extracting data from JSONL files.
  5. Additional Metadata (optional)

    • Type: json
    • Description: Additional metadata to be added to the extracted documents.
  6. Omit Metadata Keys (optional)

    • Type: string
    • Description: Comma-separated list of metadata keys to omit from the default set. Use ’*’ to omit all default metadata.

Supported File Types

  • Text (.txt)
  • JSON (.json)
  • JSON Lines (.jsonl)
  • CSV (.csv)
  • Excel (.xls, .xlsx)
  • Word (.docx, .doc)
  • PDF (.pdf)
  • YAML (.yaml)

Functionality

  1. The node accepts file inputs either as base64-encoded strings or references to files in storage.
  2. It determines the file type and uses the appropriate loader for each file.
  3. If a text splitter is provided, it splits the loaded documents.
  4. Additional metadata is added to each document if specified.
  5. Specified metadata keys are omitted if requested.

Output

The node outputs an array of Document objects, each containing:

  • The content of the loaded file (or a chunk of it if split)
  • Metadata associated with the document

Use Cases

  • Loading and preprocessing documents for language models
  • Preparing data for text analysis or NLP tasks
  • Batch processing of multiple files of various formats
  • Customizing document metadata for specific applications

Notes

  • The node uses various loaders from the LangChain library to handle different file types.
  • It includes special handling for PDF files, allowing for per-page or per-file document creation.
  • The node supports both local file uploads and files stored in a file storage system.