Character Text Splitter
The Character Text Splitter is a node used for splitting text into smaller chunks based on a specified character separator. It’s particularly useful for processing large text documents into manageable pieces for further analysis or processing.
Node Details
-
Name: characterTextSplitter
-
Type: CharacterTextSplitter
-
Category: Text Splitters
-
Version: 1.0
Parameters
-
Chunk Size
-
Label: Chunk Size
-
Name: chunkSize
-
Type: number
-
Description: Number of characters in each chunk
-
Default: 1000
-
Optional: Yes
-
-
Chunk Overlap
-
Label: Chunk Overlap
-
Name: chunkOverlap
-
Type: number
-
Description: Number of characters to overlap between chunks
-
Default: 200
-
Optional: Yes
-
-
Custom Separator
-
Label: Custom Separator
-
Name: separator
-
Type: string
-
Description: Custom separator to determine when to split the text (overrides the default separator)
-
Placeholder: ” ” (space)
-
Optional: Yes
-
Input
The node expects text input that needs to be split into smaller chunks.
Output
The node outputs an instance of CharacterTextSplitter configured with the specified parameters, which can be used to split input text into chunks.
Usage
This node is typically used in text processing pipelines where large documents need to be broken down into smaller pieces. It’s particularly useful in scenarios such as:
-
Preparing text for embedding or semantic analysis
-
Breaking down large documents for summarization
-
Splitting text for parallel processing
-
Preparing input for language models with token limits
The ability to customize chunk size, overlap, and separator makes this node versatile for various text processing needs.
Implementation Details
The node uses the CharacterTextSplitter class from the ‘langchain/text_splitter’ package. It initializes the splitter with the provided parameters (chunk size, chunk overlap, and custom separator) and returns the configured splitter instance for use in the workflow.