
Node Details
- Name: docxFile
- Type: Document
- Version: 1.0
- Category: Document Loaders
Parameters
-
Docx File (required)
- Label: Docx File
- Name: docxFile
- Type: file
- File Type: .docx
- Description: The DOCX file(s) to be loaded and processed.
-
Text Splitter (optional)
- Label: Text Splitter
- Name: textSplitter
- Type: TextSplitter
- Description: An optional text splitter to break the document into smaller chunks.
-
Additional Metadata (optional)
- Label: Additional Metadata
- Name: metadata
- Type: json
- Description: Additional metadata to be added to the extracted documents.
-
Omit Metadata Keys (optional)
- Label: Omit Metadata Keys
- Name: omitMetadataKeys
- Type: string
- Description: A comma-separated list of metadata keys to omit from the default set. Use * to omit all metadata keys except those specified in the Additional Metadata field.
Input
The node accepts one or more DOCX files, either as base64-encoded strings or as references to files stored in the system’s file storage.Output
The node outputs an array of IDocument objects, each representing a chunk or the entire content of the processed DOCX file(s), along with associated metadata.Functionality
- Loads DOCX file(s) from either base64-encoded strings or the system’s file storage.
- Optionally splits the document content using the provided text splitter.
- Applies additional metadata if provided.
- Omits specified metadata keys if requested.
- Returns an array of processed document objects.
Use Cases
- Extracting text content from DOCX files for further processing or analysis.
- Preparing DOCX content for use in language models or other NLP tasks.
- Integrating DOCX file content into document-based workflows or chatbots.
Notes
- The node supports processing multiple files in a single operation.
- It can handle both locally uploaded files and files stored in the system’s file storage.
- The text splitter option allows for breaking large documents into more manageable chunks.
- Custom metadata can be added, and existing metadata can be selectively omitted for fine-grained control over the output.