Document Loaders
Document Loaders are components designed to extract and process data from various sources, converting them into a format suitable for use in language models and other AI applications. This category includes a wide range of loaders to handle different file types, data sources, and APIs.
Available Components
Airtable Document Loader
Extracts data from Airtable bases and tables
API Loader
Loads documents from REST API endpoints
Apify Website Content Crawler
Crawls websites using Apify’s web scraping platform
Cheerio Web Scraper
Extracts content from web pages using Cheerio
Confluence Document Loader
Loads content from Confluence pages and spaces
Custom Document Loader
Create custom loaders for specialized document types
CSV File Node
Processes CSV files into structured documents
Document Store Loader
Loads documents from document storage systems
Docx File Node
Extracts content from Microsoft Word documents
Figma Document Loader
Retrieves design content from Figma files
File Loader Node
Generic loader for various file types
FireCrawl Document Loader
Web crawler for content extraction
Folder with Files Node
Processes multiple files within a directory
Github Document Loader
Extracts content from GitHub repositories
Gitbook Document Loader
Loads content from Gitbook documentation
JSON File Document Loader
Processes JSON files into documents
JSON Lines File Node
Handles JSONL format files
Notion Database Document Loader
Extracts content from Notion databases
Notion Folder Document Loader
Processes multiple Notion pages in a folder
Notion Page Document Loader
Loads content from individual Notion pages
PDF Document Loader
Extracts text from PDF files
Plain Text Document Loader
Processes plain text files
S3 Directory Node
Loads documents from AWS S3 directories
S3 Document Loader
Processes individual files from AWS S3
SerpAPI For Web Search
Retrieves search results as documents
Spider Document Loaders
Crawls websites to extract content
Text File Document Loader
Processes text files into documents
Unstructured File Loader
Handles various unstructured file formats
Unstructured Folder Loader
Processes folders of unstructured files
VectorStore To Document
Converts vector store entries to documents
Use Cases
These Document Loaders are beneficial for various use cases, including:
- Data Extraction: Pulling content from diverse sources like web pages, APIs, databases, and file systems.
- Text Processing: Converting different file formats (PDF, DOCX, CSV, JSON) into processable text.
- Web Scraping: Extracting data from websites and web applications.
- Knowledge Base Creation: Building structured datasets from unstructured or semi-structured sources.
- Content Aggregation: Collecting and organizing information from multiple sources.
- Data Preprocessing: Preparing data for natural language processing tasks or machine learning models.
- Document Analysis: Extracting and structuring information from complex document formats.
- API Integration: Fetching and processing data from various third-party APIs.
- Cloud Storage Access: Retrieving and processing documents stored in cloud services like S3.
- Version Control Integration: Extracting content from version control systems like GitHub.
- Design Tool Integration: Accessing and processing design data from tools like Figma.
- Collaborative Tool Integration: Extracting data from collaborative platforms like Notion and Confluence.
These Document Loaders provide a flexible foundation for ingesting data from a wide array of sources, making it easier to build comprehensive and diverse datasets for AI and machine learning applications.
Was this page helpful?