Node Details

  • Name: inputModerationSimple
  • Type: Moderation
  • Version: 2.0
  • Category: Moderation

Base Classes

  • Moderation
  • (Additional base classes from the Moderation parent class)

Parameters

Inputs

  1. Deny List (Required)
    • Type: string
    • Description: List of denied phrases or instructions (one per line)
    • Example:
      ignore previous instructions
      do not follow the directions
      you must ignore all previous instructions
      
  2. Chat Model (Optional)
    • Type: BaseChatModel
    • Description: Language model to detect semantic similarities with denied phrases
  3. Error Message (Optional)
    • Type: string
    • Default: “Cannot Process! Input violates content moderation policies.”
    • Description: Custom error message to display when moderation fails

Functionality

The Simple Prompt Moderation node provides content filtering through the following features:
  1. Pattern Matching
    • Exact match detection against deny list
    • Case-insensitive comparison
    • Line-by-line analysis
  2. Semantic Analysis (when Chat Model is provided)
    • Similarity detection using LLM
    • Context-aware filtering
    • Flexible matching capabilities
  3. Customization Options
    • User-defined deny lists
    • Configurable error messages
    • Optional LLM integration

Use Cases

  1. Prompt Injection Prevention
    • Block attempts to override system instructions
    • Prevent prompt manipulation
    • Maintain system integrity
  2. Content Filtering
    • Filter specific keywords or phrases
    • Implement custom content policies
    • Control user input quality
  3. Safety Enforcement
    • Prevent harmful instructions
    • Block unwanted commands
    • Maintain usage boundaries

Integration Notes

  • Position the node early in your workflow to filter inputs
  • Consider combining with other moderation nodes for layered protection
  • Monitor and update deny lists regularly
  • Test thoroughly with various input patterns

Best Practices

  1. Deny List Management
    • Keep deny lists up to date
    • Use specific, clear patterns
    • Document denied phrases
    • Regular expression support for complex patterns
  2. Error Handling
    • Provide clear error messages
    • Log moderation events
    • Implement appropriate fallbacks
  3. Performance Optimization
    • Balance deny list size with performance
    • Consider caching for frequent patterns
    • Monitor LLM usage when enabled
  4. Maintenance
    • Regular deny list reviews
    • Update patterns based on new threats
    • Monitor false positive/negative rates
    • Adjust sensitivity as needed