See. Understand.
Perceive.
A powerful Python CLI for video understanding using Google's Gemini API. Native video processing, YouTube support, cost estimation—all from your terminal.
Features
True Video Understanding
The first CLI that actually understands video—not just frames.
Native Video Processing
Upload and analyze videos directly with Gemini's native multimodal understanding. No frame extraction required.
YouTube Support
Analyze YouTube videos directly via URL. Gemini's preview feature enables seamless video understanding from the web.
Long Video Support
Process videos up to 2+ hours with Gemini's 2 million token context window. Perfect for lectures, meetings, and documentaries.
Audio + Visual
Combined audio-visual analysis without separate transcription steps. Understand speech, music, and visuals together.
Cost Transparency
See token usage and estimated costs for every analysis. Budget-friendly at $0.11-0.32 per hour of video.
Simple CLI
One command to analyze any video. Customize prompts, choose models, and get results in seconds.
Quick Start
Perception in Seconds
Install and analyze your first video in under a minute.
# Install the package $ pip install merleau # Set your Gemini API key $ export GEMINI_API_KEY="your-api-key" # Analyze a video $ ponty video.mp4 Uploading video: video.mp4 Upload complete. File URI: files/abc123 Waiting for file to be processed... File state: ACTIVE Analyzing video with gemini-2.5-flash... --- Video Analysis --- The video shows a product demonstration... --- Usage Information --- Prompt tokens: 45,231 Response tokens: 847 Total tokens: 46,078 Estimated cost: $0.007
# Summarize key points $ ponty lecture.mp4 -p "Summarize the main topics covered" # Extract action items from a meeting $ ponty meeting.mp4 -p "List all action items and who is responsible" # Analyze sports footage $ ponty game.mp4 -p "Describe the key plays and turning points" # Use a different model $ ponty video.mp4 -m gemini-2.0-flash -p "What products are shown?" # Hide cost information $ ponty video.mp4 --no-cost
Why Gemini
The Only Choice for Native Video
Gemini is the only major AI provider with true video understanding.
| Capability | Gemini | GPT-4o | Claude |
|---|---|---|---|
| Native Video Upload | ✓ Direct upload | ✗ Frame extraction | ✗ No support |
| Audio from Video | ✓ Combined analysis | ✗ Separate Whisper | ✗ No support |
| Max Duration | 2+ hours | Minutes | N/A |
| YouTube URLs | ✓ Free preview | ✗ No | ✗ No |
| Cost per Hour | $0.11-0.32 | ~$7.50 | N/A |
The world is not what I think, but what I live through.
CLI Reference
The ponty Command
Simple, powerful video analysis from your terminal.