Node Details

  • Name: jsonFile
  • Type: Document
  • Category: Document Loaders
  • Version: 1.0

Parameters

  1. Json File (Required)

    • Type: file
    • File Type: .json
    • Description: The JSON file(s) to be loaded and processed.
  2. Text Splitter (Optional)

    • Type: TextSplitter
    • Description: A text splitter to break down large documents into smaller chunks.
  3. Pointers Extraction (Optional)

    • Type: string
    • Description: Comma-separated list of pointers for extracting specific data from the JSON structure.
    • Example: “data.text,data.metadata”
  4. Additional Metadata (Optional)

    • Type: json
    • Description: Additional metadata to be added to the extracted documents.
  5. Omit Metadata Keys (Optional)

    • Type: string
    • Description: Comma-separated list of metadata keys to be omitted from the final documents.
    • Special Value: Use ”*” to omit all metadata keys except those specified in Additional Metadata.

Functionality

  1. File Loading:

    • Supports loading single or multiple JSON files.
    • Can load files from base64-encoded strings or from file storage.
  2. JSON Parsing:

    • Uses the JSONLoader from langchain to parse JSON data.
    • Supports extraction of specific data using JSON pointers.
  3. Text Splitting:

    • If a text splitter is provided, it splits the loaded documents into smaller chunks.
  4. Metadata Handling:

    • Allows adding custom metadata to all documents.
    • Provides options to omit specific or all default metadata keys.
  5. Document Processing:

    • Converts loaded JSON data into IDocument objects.
    • Applies metadata modifications as specified.

Input/Output

  • Input: JSON file(s), optional text splitter, and metadata configuration.
  • Output: An array of IDocument objects, each representing a portion of the loaded JSON data with associated metadata.

Use Cases

  • Loading and processing large JSON datasets.
  • Extracting specific information from complex JSON structures.
  • Preparing JSON data for further processing in NLP or machine learning pipelines.
  • Customizing metadata for document management systems.

Notes

  • The node is flexible in handling both single and multiple JSON files.
  • It integrates well with text splitting operations for handling large documents.
  • The pointer extraction feature allows for targeted data retrieval from nested JSON structures.
  • Metadata handling capabilities make it suitable for various document processing workflows.