Recursive Character Text Splitter
The RecursiveCharacterTextSplitter node is a text splitting component used for dividing large text documents into smaller chunks. It’s particularly useful for processing long documents that need to be broken down into more manageable pieces for analysis, summarization, or other natural language processing tasks.
Node Details
-
Name: recursiveCharacterTextSplitter
-
Type: RecursiveCharacterTextSplitter
-
Version: 2.0
-
Category: Text Splitters
Parameters
-
Chunk Size
-
Label: Chunk Size
-
Name: chunkSize
-
Type: number
-
Description: Number of characters in each chunk
-
Default: 1000
-
Optional: Yes
-
-
Chunk Overlap
-
Label: Chunk Overlap
-
Name: chunkOverlap
-
Type: number
-
Description: Number of characters to overlap between chunks
-
Default: 200
-
Optional: Yes
-
-
Custom Separators
-
Label: Custom Separators
-
Name: separators
-
Type: string
-
Description: Array of custom separators to determine when to split the text, will override the default separators
-
Placeholder:
["|", "##", ">", "-"]
-
Optional: Yes
-
Additional: This is an advanced parameter
-
Input
The node takes the following inputs:
-
Text document(s) to be split (handled internally by the system)
-
Configuration parameters as described above
Output
The node outputs a RecursiveCharacterTextSplitter object, which can be used to split text documents into chunks based on the specified parameters.
Usage
This node is typically used in document processing pipelines where large texts need to be broken down into smaller, more manageable pieces. It’s particularly useful for:
-
Preparing text for large language models with context limitations
-
Breaking down documents for summarization tasks
-
Splitting text for parallel processing
-
Creating more granular sections for information retrieval or question-answering systems
Implementation Details
-
The node uses the
RecursiveCharacterTextSplitter
class from the ‘langchain/text_splitter’ library. -
It supports dynamic configuration of chunk size, overlap, and custom separators.
-
The
init
method creates and returns a configuredRecursiveCharacterTextSplitter
object based on the input parameters. -
Custom separators, if provided, are parsed from a JSON string to an array.
Error Handling
The node includes error handling for parsing custom separators. If the separators string cannot be parsed as valid JSON, it will throw an error.
Base Classes
The node inherits from the following base classes:
-
RecursiveCharacterTextSplitter
-
Additional base classes derived from the RecursiveCharacterTextSplitter class
This design allows for easy integration with other components that expect these base class types.