JSON Lines File Node
The JSON Lines File node is a document loader that processes JSON Lines (JSONL) files. It extracts specified data from each line of the file and converts it into document format for further processing in a language model pipeline.
Node Details
- Name: jsonlinesFile
- Type: Document
- Category: Document Loaders
- Version: 2.0
Parameters
-
Jsonlines File (required)
- Type: file
- File Type: .jsonl
- Description: The JSON Lines file to be processed.
-
Text Splitter (optional)
- Type: TextSplitter
- Description: A text splitter to break down large documents into smaller chunks.
-
Pointer Extraction (required)
- Type: string
- Description: The key to extract from each JSON object as the main content.
- Example: For
{ "key": "value" }
, setting this to “key” will extract “value” as the page content.
-
Additional Metadata (optional)
- Type: json
- Description: Additional metadata to be added to the extracted documents. Supports dynamic extraction from the document.
- Example:
{ "source": "/source" }
will extract the value of the “source” key from each JSON object and add it to the metadata.
-
Omit Metadata Keys (optional)
- Type: string
- Description: A comma-separated list of metadata keys to omit from the final output. Use ”*” to omit all default metadata keys.
Input
The node accepts a JSON Lines file, either as a base64-encoded string or a reference to a file in storage.
Output
The node outputs an array of IDocument objects, each containing:
- pageContent: The extracted content based on the Pointer Extraction parameter.
- metadata: A combination of default metadata, extracted metadata, and additional metadata specified in the parameters.
Usage
This node is particularly useful for processing large datasets stored in JSON Lines format. It allows for efficient extraction of specific fields and flexible metadata handling, making it ideal for preparing data for language model training or querying.
Special Features
- Supports both local files and files from storage systems.
- Can process multiple files in a single operation.
- Allows dynamic metadata extraction from the document content.
- Supports text splitting for large documents.
- Provides fine-grained control over metadata inclusion/exclusion.
Example
For a JSONL file containing: