Cache Overview
Cache components provide efficient mechanisms for storing and retrieving frequently accessed data, particularly LLM responses and embeddings. These components help optimize performance, reduce latency, and minimize unnecessary API calls or computations.
Available Components
InMemory Cache
Simple in-memory caching for LLM responses within a single session
InMemory Embedding Cache
In-memory caching for embeddings within a single session
Momento Cache
Distributed, serverless caching using Momento service for LLM responses
Redis Cache
Caching LLM responses using Redis, suitable for multi-process or multi-server setups
Redis Embeddings Cache
Caching embeddings using Redis for improved efficiency in embedding-heavy applications
Upstash Redis Cache
Serverless Redis caching for LLM responses, ideal for edge computing and serverless environments
Use Cases:
- Improving response times for frequently asked questions or similar queries
- Reducing API costs by minimizing redundant LLM calls or embedding generations
- Enhancing user experience in chatbots or AI assistants with quicker responses
- Optimizing performance in scenarios with repetitive queries or embedding requests
- Sharing cache across multiple processes, servers, or serverless function invocations
- Implementing efficient caching in distributed, edge, or serverless computing environments
Cache components are particularly beneficial in applications where quick response times are crucial, similar queries are likely to occur, or where minimizing computational resources and API calls is important.