Node Details

  • Name: s3Directory
  • Type: Document
  • Category: Document Loaders
  • Version: 3.0

Parameters

Credential (Optional)

  • Type: credential
  • Credential Names: awsApi
  • Description: AWS API credentials for accessing the S3 bucket

Inputs

  1. Text Splitter (Optional)

    • Type: TextSplitter
    • Description: A text splitter to process the loaded documents
  2. Bucket

    • Type: string
    • Description: The name of the S3 bucket to load files from
  3. Region

    • Type: asyncOptions
    • Default: “us-east-1”
    • Description: AWS region where the S3 bucket is located
  4. Server URL (Optional)

    • Type: string
    • Description: Custom endpoint URL for S3-compatible services
  5. Prefix (Optional)

    • Type: string
    • Description: Limits the response to keys that begin with the specified prefix
  6. Pdf Usage (Optional)

    • Type: options
    • Options:
      • One document per page
      • One document per file
    • Default: “One document per page”
    • Description: Determines how PDF files are processed
  7. Additional Metadata (Optional)

    • Type: json
    • Description: Additional metadata to be added to the extracted documents
  8. Omit Metadata Keys (Optional)

    • Type: string
    • Description: Comma-separated list of metadata keys to omit from the output

Functionality

  1. Connects to the specified S3 bucket using provided credentials
  2. Lists and downloads all files from the bucket (or within the specified prefix)
  3. Processes each file based on its extension using appropriate loaders
  4. Applies text splitting if a text splitter is provided
  5. Manages metadata for each document
  6. Returns an array of processed documents

Use Cases

  • Loading large datasets stored in S3 for natural language processing tasks
  • Preprocessing documents from S3 for search indexing or analysis
  • Integrating S3-stored documents into AI/ML pipelines

Supported File Formats

JSON, TXT, CSV, DOCX, PDF, ASPX, ASP, CPP, C, CS, CSS, GO, H, KT, JAVA, JS, LESS, TS, PHP, PROTO, PYTHON, PY, RST, RUBY, RB, RS, SCALA, SC, SCSS, SOL, SQL, SWIFT, MARKDOWN, MD, TEX, LTX, HTML, VB, XML

Notes

  • Temporary files are created locally and cleaned up after processing
  • Handles nested directory structures within the S3 bucket
  • Provides options for customizing metadata and PDF processing