Node Details

  • Name: jsonlinesFile
  • Type: Document
  • Category: Document Loaders
  • Version: 2.0

Parameters

  1. Jsonlines File (required)

    • Type: file
    • File Type: .jsonl
    • Description: The JSON Lines file to be processed.
  2. Text Splitter (optional)

    • Type: TextSplitter
    • Description: A text splitter to break down large documents into smaller chunks.
  3. Pointer Extraction (required)

    • Type: string
    • Description: The key to extract from each JSON object as the main content.
    • Example: For { "key": "value" }, setting this to “key” will extract “value” as the page content.
  4. Additional Metadata (optional)

    • Type: json
    • Description: Additional metadata to be added to the extracted documents. Supports dynamic extraction from the document.
    • Example: { "source": "/source" } will extract the value of the “source” key from each JSON object and add it to the metadata.
  5. Omit Metadata Keys (optional)

    • Type: string
    • Description: A comma-separated list of metadata keys to omit from the final output. Use ”*” to omit all default metadata keys.

Input

The node accepts a JSON Lines file, either as a base64-encoded string or a reference to a file in storage.

Output

The node outputs an array of IDocument objects, each containing:

  • pageContent: The extracted content based on the Pointer Extraction parameter.
  • metadata: A combination of default metadata, extracted metadata, and additional metadata specified in the parameters.

Usage

This node is particularly useful for processing large datasets stored in JSON Lines format. It allows for efficient extraction of specific fields and flexible metadata handling, making it ideal for preparing data for language model training or querying.

Special Features

  • Supports both local files and files from storage systems.
  • Can process multiple files in a single operation.
  • Allows dynamic metadata extraction from the document content.
  • Supports text splitting for large documents.
  • Provides fine-grained control over metadata inclusion/exclusion.

Example

For a JSONL file containing: