Node Details

  • Name: recursiveCharacterTextSplitter

  • Type: RecursiveCharacterTextSplitter

  • Version: 2.0

  • Category: Text Splitters

Parameters

  1. Chunk Size

    • Label: Chunk Size

    • Name: chunkSize

    • Type: number

    • Description: Number of characters in each chunk

    • Default: 1000

    • Optional: Yes

  2. Chunk Overlap

    • Label: Chunk Overlap

    • Name: chunkOverlap

    • Type: number

    • Description: Number of characters to overlap between chunks

    • Default: 200

    • Optional: Yes

  3. Custom Separators

    • Label: Custom Separators

    • Name: separators

    • Type: string

    • Description: Array of custom separators to determine when to split the text, will override the default separators

    • Placeholder: ["|", "##", ">", "-"]

    • Optional: Yes

    • Additional: This is an advanced parameter

Input

The node takes the following inputs:

  • Text document(s) to be split (handled internally by the system)

  • Configuration parameters as described above

Output

The node outputs a RecursiveCharacterTextSplitter object, which can be used to split text documents into chunks based on the specified parameters.

Usage

This node is typically used in document processing pipelines where large texts need to be broken down into smaller, more manageable pieces. It’s particularly useful for:

  1. Preparing text for large language models with context limitations

  2. Breaking down documents for summarization tasks

  3. Splitting text for parallel processing

  4. Creating more granular sections for information retrieval or question-answering systems

Implementation Details

  • The node uses the RecursiveCharacterTextSplitter class from the ‘langchain/text_splitter’ library.

  • It supports dynamic configuration of chunk size, overlap, and custom separators.

  • The init method creates and returns a configured RecursiveCharacterTextSplitter object based on the input parameters.

  • Custom separators, if provided, are parsed from a JSON string to an array.

Error Handling

The node includes error handling for parsing custom separators. If the separators string cannot be parsed as valid JSON, it will throw an error.

Base Classes

The node inherits from the following base classes:

  • RecursiveCharacterTextSplitter

  • Additional base classes derived from the RecursiveCharacterTextSplitter class

This design allows for easy integration with other components that expect these base class types.