Node Details

  • Name: docxFile
  • Type: Document
  • Version: 1.0
  • Category: Document Loaders

Parameters

  1. Docx File (required)

    • Label: Docx File
    • Name: docxFile
    • Type: file
    • File Type: .docx
    • Description: The DOCX file(s) to be loaded and processed.
  2. Text Splitter (optional)

    • Label: Text Splitter
    • Name: textSplitter
    • Type: TextSplitter
    • Description: An optional text splitter to break the document into smaller chunks.
  3. Additional Metadata (optional)

    • Label: Additional Metadata
    • Name: metadata
    • Type: json
    • Description: Additional metadata to be added to the extracted documents.
  4. Omit Metadata Keys (optional)

    • Label: Omit Metadata Keys
    • Name: omitMetadataKeys
    • Type: string
    • Description: A comma-separated list of metadata keys to omit from the default set. Use * to omit all metadata keys except those specified in the Additional Metadata field.

Input

The node accepts one or more DOCX files, either as base64-encoded strings or as references to files stored in the system’s file storage.

Output

The node outputs an array of IDocument objects, each representing a chunk or the entire content of the processed DOCX file(s), along with associated metadata.

Functionality

  1. Loads DOCX file(s) from either base64-encoded strings or the system’s file storage.
  2. Optionally splits the document content using the provided text splitter.
  3. Applies additional metadata if provided.
  4. Omits specified metadata keys if requested.
  5. Returns an array of processed document objects.

Use Cases

  • Extracting text content from DOCX files for further processing or analysis.
  • Preparing DOCX content for use in language models or other NLP tasks.
  • Integrating DOCX file content into document-based workflows or chatbots.

Notes

  • The node supports processing multiple files in a single operation.
  • It can handle both locally uploaded files and files stored in the system’s file storage.
  • The text splitter option allows for breaking large documents into more manageable chunks.
  • Custom metadata can be added, and existing metadata can be selectively omitted for fine-grained control over the output.