Document loaders
JSON File Document Loader
The JSON File Document Loader is a node designed to load and process data from JSON files. It’s part of the Document Loaders category and is used to extract information from JSON-formatted data, with options for text splitting and metadata manipulation.
Node Details
- Name: jsonFile
- Type: Document
- Category: Document Loaders
- Version: 1.0
Parameters
-
Json File (Required)
- Type: file
- File Type: .json
- Description: The JSON file(s) to be loaded and processed.
-
Text Splitter (Optional)
- Type: TextSplitter
- Description: A text splitter to break down large documents into smaller chunks.
-
Pointers Extraction (Optional)
- Type: string
- Description: Comma-separated list of pointers for extracting specific data from the JSON structure.
- Example: “data.text,data.metadata”
-
Additional Metadata (Optional)
- Type: json
- Description: Additional metadata to be added to the extracted documents.
-
Omit Metadata Keys (Optional)
- Type: string
- Description: Comma-separated list of metadata keys to be omitted from the final documents.
- Special Value: Use ”*” to omit all metadata keys except those specified in Additional Metadata.
Functionality
-
File Loading:
- Supports loading single or multiple JSON files.
- Can load files from base64-encoded strings or from file storage.
-
JSON Parsing:
- Uses the JSONLoader from langchain to parse JSON data.
- Supports extraction of specific data using JSON pointers.
-
Text Splitting:
- If a text splitter is provided, it splits the loaded documents into smaller chunks.
-
Metadata Handling:
- Allows adding custom metadata to all documents.
- Provides options to omit specific or all default metadata keys.
-
Document Processing:
- Converts loaded JSON data into IDocument objects.
- Applies metadata modifications as specified.
Input/Output
- Input: JSON file(s), optional text splitter, and metadata configuration.
- Output: An array of IDocument objects, each representing a portion of the loaded JSON data with associated metadata.
Use Cases
- Loading and processing large JSON datasets.
- Extracting specific information from complex JSON structures.
- Preparing JSON data for further processing in NLP or machine learning pipelines.
- Customizing metadata for document management systems.
Notes
- The node is flexible in handling both single and multiple JSON files.
- It integrates well with text splitting operations for handling large documents.
- The pointer extraction feature allows for targeted data retrieval from nested JSON structures.
- Metadata handling capabilities make it suitable for various document processing workflows.