JSON File Document Loader

Node Details

Json File (Required)
- Type: file
- File Type: .json
- Description: The JSON file(s) to be loaded and processed.
Text Splitter (Optional)
- Type: TextSplitter
- Description: A text splitter to break down large documents into smaller chunks.
Pointers Extraction (Optional)
- Type: string
- Description: Comma-separated list of pointers for extracting specific data from the JSON structure.
- Example: “data.text,data.metadata”
Additional Metadata (Optional)
- Type: json
- Description: Additional metadata to be added to the extracted documents.
Omit Metadata Keys (Optional)
- Type: string
- Description: Comma-separated list of metadata keys to be omitted from the final documents.
- Special Value: Use ”*” to omit all metadata keys except those specified in Additional Metadata.

File Loading:
- Supports loading single or multiple JSON files.
- Can load files from base64-encoded strings or from file storage.
JSON Parsing:
- Uses the JSONLoader from langchain to parse JSON data.
- Supports extraction of specific data using JSON pointers.
Text Splitting:
- If a text splitter is provided, it splits the loaded documents into smaller chunks.
Metadata Handling:
- Allows adding custom metadata to all documents.
- Provides options to omit specific or all default metadata keys.
Document Processing:
- Converts loaded JSON data into IDocument objects.
- Applies metadata modifications as specified.

Input: JSON file(s), optional text splitter, and metadata configuration.
Output: An array of IDocument objects, each representing a portion of the loaded JSON data with associated metadata.

Loading and processing large JSON datasets.
Extracting specific information from complex JSON structures.
Preparing JSON data for further processing in NLP or machine learning pipelines.
Customizing metadata for document management systems.

The node is flexible in handling both single and multiple JSON files.
It integrates well with text splitting operations for handling large documents.
The pointer extraction feature allows for targeted data retrieval from nested JSON structures.
Metadata handling capabilities make it suitable for various document processing workflows.

On this page