Generate Caption
AI Services
Generate Caption
Generate AI captions for session media using OpenRouter/OpenAI with subreddit-based themes
POST
Generate Caption
This endpoint generates AI-powered captions for images in JOIP sessions. It uses OpenRouter API with fallback to OpenAI for generating authentic, subreddit-themed captions that match the community’s voice and expectations.
Subreddit Themes:
Authentication
Requires user authentication via session.Request
The URL of the image to caption. Supports both HTTP URLs and data URLs.Note: Animated GIFs are not supported. Use static images (JPEG, PNG, WebP).
The Reddit post title for context. Used to inform caption generation.
The source subreddit (e.g., “joi”, “gonewild”). Used to apply subreddit-specific themes and voice.Supported themes:
- Celebrity worship (Selena Gomez, Taylor Swift, etc.)
- Body part focused (ass, tits, feet)
- Kink-specific (femdom, cuckold, sissy, joi)
- Demographics (MILF, teen, ethnic)
Optional session ID for context validation. If provided, the user must have access to the session.
Request Example
Response
The generated caption text (50-150 characters for session playback).Captions are:
- Short and punchy (50-150 chars) for 2-7 second display windows
- Subreddit-themed based on community voice
- Cleaned of markdown formatting for canvas rendering
- Variation-optimized to prevent repetitive outputs
Success Response
Error Responses
Error message describing what went wrong.
Set to
true if OPENROUTER_API_KEY is not configured.Error code for programmatic handling:
MODEL_IMAGE_UNSUPPORTED- Selected model doesn’t support imagesMEDIA_TYPE_NOT_SUPPORTED- Animated GIFs or unsupported formatsFILE_TOO_LARGE- Image exceeds 20MB limit
Error Examples
Implementation Details
OpenRouter Integration
- Primary Provider: OpenRouter API with configurable model selection
- Model Selection: Set via
OPENROUTER_MODEL_IDenvironment variable - Fallback Logic: Gemini models use safety_settings to disable content filtering
- Retry Logic: 3 attempts with content policy rejection handling
Caption Generation Strategy
Variation System:- Random opening styles (imperative, rhetorical, conditional, etc.)
- Random structures (short_punchy, build_and_drop, interrupted, layered)
- Session-based phrase tracking to prevent repetition
- Global phrase tracking (max 1000 phrases) for uniqueness
- Celebrity worship → “Celebrity worship and obsession content”
- Ass/booty subreddits → “Ass worship and body appreciation”
- Femdom/goddess → “Female domination and goddess worship”
- JOI/jerk → “Jerk off instruction and edging control”
Content Filtering
Gemini Safety Settings:Image Compatibility
Supported Formats:- JPEG (image/jpeg)
- PNG (image/png)
- WebP (image/webp)
- Maximum: 20MB per image
- Pre-check via HEAD request to validate size before processing
- Animated GIFs (detected and rejected)
- Video formats (MP4, WebM)
Caching
Client-Side:- Memory cache for active session
- IndexedDB persistence (24-hour TTL)
- Keyed by:
mediaUrl|theme|customPrompt
- Use
/api/captions/prewarmto queue background generation - First 5 slides auto-warmed for instant playback
- 2 concurrent workers, 100-300ms jitter between tasks
Usage Notes
Rate Limiting: This endpoint does NOT deduct credits. Caption generation is free during session playback. Credit charges apply only to
/api/manual/generate-ai-caption for manual session editing.Related Endpoints
- POST /api/manual/generate-ai-caption - Contextual captions for manual sessions (credit-based)
- POST /api/captions/prewarm - Queue background caption generation
- GET /api/sessions/:id - Retrieve session media for captioning