Document loaders
Folder with Files
The “Folder with Files” node is a document loader that can process multiple files from a specified folder. It supports various file formats and can recursively search through subdirectories.
Node Details
- Name: folderFiles
- Type: Document
- Category: Document Loaders
- Version: 3.0
Parameters
-
Folder Path
- Type: string
- Description: The path to the folder containing the files to be processed.
-
Recursive
- Type: boolean
- Description: If set to true, the loader will search for files in subdirectories as well.
-
Text Splitter
- Type: TextSplitter
- Optional: Yes
- Description: A text splitter to be applied to the loaded documents.
-
Pdf Usage
- Type: options
- Options:
- One document per page
- One document per file
- Default: One document per page
- Description: Determines how PDF files are processed.
-
JSONL Pointer Extraction
- Type: string
- Optional: Yes
- Description: Specifies the pointer for extracting data from JSONL files.
-
Additional Metadata
- Type: json
- Optional: Yes
- Description: Additional metadata to be added to the extracted documents.
-
Omit Metadata Keys
- Type: string
- Optional: Yes
- Description: Comma-separated list of metadata keys to be omitted from the extracted documents. Use * to omit all metadata keys except those specified in Additional Metadata.
Supported File Formats
- JSON (.json)
- JSONL (.jsonl)
- Text (.txt)
- CSV (.csv, .xls, .xlsx)
- Word Documents (.doc, .docx)
- PDF (.pdf)
- ASP (.aspx, .asp)
- C++ (.cpp, .h)
- C (.c)
- C# (.cs)
- CSS (.css)
- Go (.go)
- Kotlin (.kt)
- Java (.java)
- JavaScript (.js)
- Less (.less)
- TypeScript (.ts)
- PHP (.php)
- Protocol Buffers (.proto)
- Python (.python, .py)
- reStructuredText (.rst)
- Ruby (.ruby, .rb)
- Rust (.rs)
- Scala (.scala, .sc)
- Sass (.scss)
- Solidity (.sol)
- SQL (.sql)
- Swift (.swift)
- Markdown (.markdown, .md)
- LaTeX (.tex, .ltx)
- HTML (.html)
- Visual Basic (.vb)
- XML (.xml)
Input
- Folder path and configuration options as specified in the parameters.
Output
- An array of document objects, each containing the content of a file and its associated metadata.
Functionality
- The node creates a DirectoryLoader with specific loaders for each supported file type.
- It loads documents from the specified folder, optionally searching recursively.
- If a text splitter is provided, it splits the loaded documents.
- Additional metadata is added to each document if specified.
- Metadata keys are omitted based on the “Omit Metadata Keys” parameter.
Use Cases
- Bulk loading of documents from a file system for processing or analysis.
- Preparing diverse document sets for ingestion into language models or other NLP tasks.
- Extracting and organizing content from multiple file types in a structured manner.
Was this page helpful?
Previous
Gitbook Document LoaderThe Gitbook Document Loader is a component designed to load and process content from GitBook documentation websites. It allows users to extract text content from GitBook pages, optionally split the text into smaller chunks, and add or modify metadata associated with the extracted documents.
Next