Node Details

  • Name: folderFiles
  • Type: Document
  • Category: Document Loaders
  • Version: 3.0

Parameters

  1. Folder Path

    • Type: string
    • Description: The path to the folder containing the files to be processed.
  2. Recursive

    • Type: boolean
    • Description: If set to true, the loader will search for files in subdirectories as well.
  3. Text Splitter

    • Type: TextSplitter
    • Optional: Yes
    • Description: A text splitter to be applied to the loaded documents.
  4. Pdf Usage

    • Type: options
    • Options:
      • One document per page
      • One document per file
    • Default: One document per page
    • Description: Determines how PDF files are processed.
  5. JSONL Pointer Extraction

    • Type: string
    • Optional: Yes
    • Description: Specifies the pointer for extracting data from JSONL files.
  6. Additional Metadata

    • Type: json
    • Optional: Yes
    • Description: Additional metadata to be added to the extracted documents.
  7. Omit Metadata Keys

    • Type: string
    • Optional: Yes
    • Description: Comma-separated list of metadata keys to be omitted from the extracted documents. Use * to omit all metadata keys except those specified in Additional Metadata.

Supported File Formats

  • JSON (.json)
  • JSONL (.jsonl)
  • Text (.txt)
  • CSV (.csv, .xls, .xlsx)
  • Word Documents (.doc, .docx)
  • PDF (.pdf)
  • ASP (.aspx, .asp)
  • C++ (.cpp, .h)
  • C (.c)
  • C# (.cs)
  • CSS (.css)
  • Go (.go)
  • Kotlin (.kt)
  • Java (.java)
  • JavaScript (.js)
  • Less (.less)
  • TypeScript (.ts)
  • PHP (.php)
  • Protocol Buffers (.proto)
  • Python (.python, .py)
  • reStructuredText (.rst)
  • Ruby (.ruby, .rb)
  • Rust (.rs)
  • Scala (.scala, .sc)
  • Sass (.scss)
  • Solidity (.sol)
  • SQL (.sql)
  • Swift (.swift)
  • Markdown (.markdown, .md)
  • LaTeX (.tex, .ltx)
  • HTML (.html)
  • Visual Basic (.vb)
  • XML (.xml)

Input

  • Folder path and configuration options as specified in the parameters.

Output

  • An array of document objects, each containing the content of a file and its associated metadata.

Functionality

  1. The node creates a DirectoryLoader with specific loaders for each supported file type.
  2. It loads documents from the specified folder, optionally searching recursively.
  3. If a text splitter is provided, it splits the loaded documents.
  4. Additional metadata is added to each document if specified.
  5. Metadata keys are omitted based on the “Omit Metadata Keys” parameter.

Use Cases

  • Bulk loading of documents from a file system for processing or analysis.
  • Preparing diverse document sets for ingestion into language models or other NLP tasks.
  • Extracting and organizing content from multiple file types in a structured manner.