Document loaders
Text File Document Loader
The Text File Document Loader is a node designed to load and process text-based documents from various file formats. It’s part of a document processing pipeline, capable of handling single or multiple files, splitting text, and managing metadata.
Node Details
- Name: Text_DocumentLoaders
- Label: Text File
- Version: 3.0
- Type: Document
- Category: Document Loaders
Input Parameters
-
Txt File (required)
- Type: file
- Supported formats: .txt, .html, .aspx, .asp, .cpp, .c, .cs, .css, .go, .h, .java, .js, .less, .ts, .php, .proto, .python, .py, .rst, .ruby, .rb, .rs, .scala, .sc, .scss, .sol, .sql, .swift, .markdown, .md, .tex, .ltx, .vb, .xml
-
Text Splitter (optional)
- Type: TextSplitter
- Purpose: Splits the loaded text into smaller chunks
-
Additional Metadata (optional)
- Type: JSON
- Description: Extra metadata to be added to the extracted documents
-
Omit Metadata Keys (optional)
- Type: string
- Description: Comma-separated list of metadata keys to omit from the default set. Use * to omit all keys except those specified in Additional Metadata.
Outputs
-
Document
- Description: Array of document objects containing metadata and pageContent
- Base Classes: Document, json
-
Text
- Description: Concatenated string from pageContent of documents
- Base Classes: string, json
Functionality
-
File Loading:
- Supports loading from local storage or base64-encoded file data
- Can handle single files or multiple files (passed as a JSON array)
-
Text Processing:
- Uses TextLoader from langchain to load text content
- Optionally splits text using provided TextSplitter
-
Metadata Management:
- Adds user-provided additional metadata
- Can omit specific or all default metadata keys
- Merges existing and new metadata
-
Output Formatting:
- Can output either as Document objects or concatenated text
- Handles escape characters in text output
Use Cases
- Loading and processing text-based documents from various sources
- Preparing text data for further NLP or machine learning tasks
- Extracting and managing metadata from text documents
- Splitting large text documents into manageable chunks
Notes
- The node is flexible in handling file inputs, supporting both direct file uploads and references to files in storage
- It integrates well with text splitting operations, allowing for easy segmentation of large documents
- The metadata management features provide fine-grained control over what information is attached to each document
Was this page helpful?
Previous
Unstructured File LoaderThe Unstructured File Loader is a document loader node that uses Unstructured.io to load and process data from various file types. It's designed to extract structured information from unstructured documents, making it easier to work with complex file formats in natural language processing pipelines.
Next