verging.ai Agent Skills

A collection of AI-powered media processing skills for coding agents.

verging.ai Agent Skills

AI-powered media processing and AI service skills for AI coding agents.

Demo

Face Swap

Face Swap Demo ▶️ Click to watch demo video

Video Enhancement

Video Enhancement Demo

Features

Face Swap

AI-powered face swap service - Use verging.ai directly from command line.

  • Support local video files and images
  • Support remote video URLs (YouTube, Bilibili, etc.)
  • Support remote image URLs
  • Auto-download remote resources
  • Real-time progress tracking
  • Video trimming (specify start/end time)

Background Removal (background-remover)

AI one-click background removal to generate transparent PNG images.

  • Perfect for: E-commerce product photos, portrait background removal, design materials
  • Support local images (JPG, PNG, WebP)
  • Support remote image URLs
  • Auto-download remote resources
  • Real-time progress tracking
  • Maximum file size: 10MB

Video Enhancement (video-enhancement)

AI video enhancement to upscale resolution, denoise, and sharpen videos.

  • Perfect for: Old video restoration, content creation, surveillance footage improvement
  • Support 2x and 4x upscale
  • Support local video files (MP4, MOV, AVI, MKV, WebM)
  • Support remote video URLs (YouTube, Bilibili, etc.)
  • Auto-download remote resources
  • Real-time progress tracking
  • Video trimming (specify start/end time)
  • Maximum video duration: 30 seconds

Chat (chat)

AI chat completion proxy - Access GPT-4o and other LLMs.

  • Support streaming and non-streaming modes
  • Multi-turn conversation
  • Token-based post-deduct billing

Text-to-Speech (tts)

AI text-to-speech - Convert text to natural speech audio.

  • 6 voices: alloy, echo, fable, onyx, nova, shimmer
  • Adjustable speed (0.25x - 4.0x)
  • Multiple formats: mp3, opus, aac, flac
  • Maximum text: 4096 characters

Speech-to-Text (stt)

AI speech-to-text - Transcribe audio files to text.

  • Supports mp3, mp4, wav, webm, m4a and more
  • Auto language detection
  • Maximum file size: 25 MB

Image Generation (imagegen)

AI image generation - Generate images from text prompts.

  • Models: gpt-image-1, dall-e-3
  • Multiple sizes and quality levels
  • Batch generation up to 4 images

Vision Analysis (vision)

AI vision analysis - Analyze images and answer questions.

  • Supports image URL and local file upload
  • GPT-4o powered visual understanding
  • Maximum image size: 20 MB

Installation

# Install all skills
npx skills add verging-ai/agent-skills

# Or install individually
npx skills add verging-ai/agent-skills --skill faceswap
npx skills add verging-ai/agent-skills --skill background-remover
npx skills add verging-ai/agent-skills --skill video-enhancement
npx skills add verging-ai/agent-skills --skill chat
npx skills add verging-ai/agent-skills --skill tts
npx skills add verging-ai/agent-skills --skill stt
npx skills add verging-ai/agent-skills --skill imagegen
npx skills add verging-ai/agent-skills --skill vision

Usage

Face Swap

# Basic usage
/faceswap --video ./input.mp4 --face ./my-face.jpg

# Specify time range
/faceswap -v ./video.mp4 -f ./face.jpg --start 5 --end 30

# Use remote video
/faceswap -v "https://youtube.com/watch?v=xxx" -f ./face.jpg --hd

# Auto download result
/faceswap -v ./video.mp4 -f ./face.jpg --download

Face Swap Options

OptionShortDescriptionDefault
--video-vTarget video file or URLRequired
--face-fFace image file or URLRequired
--start-sStart time in seconds0
--end-eEnd time in secondsVideo duration
--hd-hHD mode (3 credits/sec vs 1 credit/sec)false
--api-key-kYour API KeyVERGING_API_KEY env
--output-oOutput directoryCurrent dir
--download-dAuto download resultfalse

Background Removal

# Basic usage
/background-removal --image ./photo.jpg

# Use remote image
/background-removal -i https://example.com/photo.jpg

# Auto download result
/background-removal -i ./photo.jpg --download

Background Removal Options

OptionShortDescriptionDefault
--image-iTarget image file or URLRequired
--api-key-kYour API KeyVERGING_API_KEY env
--output-oOutput directoryCurrent dir
--download-dAuto download resultfalse

Video Enhancement

# Basic usage
/video-enhancement --video ./old-video.mp4

# HD mode
/video-enhancement -v ./video.mp4 --hd

# With time range
/video-enhancement -v ./video.mp4 --hd --start 5 --end 15

# Use remote video
/video-enhancement -v "https://youtube.com/watch?v=xxx" --download

# Auto download result
/video-enhancement -v ./video.mp4 --download

Video Enhancement Options

OptionShortDescriptionDefault
--video-vTarget video file or URLRequired
--hd-hHD mode (3 credits/sec vs 1 credit/sec)false
--start-ssStart time in seconds0
--end-eEnd time in secondsVideo duration
--api-key-kYour API KeyVERGING_API_KEY env
--output-oOutput directoryCurrent dir
--download-dAuto download resultfalse

Chat

# Simple question
/chat -m "Explain Docker in 3 sentences"

# Multi-turn with streaming
/chat --messages '[{"role":"system","content":"You are a Python expert"},{"role":"user","content":"How to read CSV?"}]' --stream

Chat Options

OptionShortDescriptionDefault
--message-mSingle user messageRequired (or --messages)
--messages-MFull messages JSON arrayRequired (or --message)
--modelLLM modelgpt-4o
--temperature-tSampling temperature (0-2)Provider default
--max-tokensMax response tokensProvider default
--stream-sEnable streamingfalse
--api-key-kYour API KeyVERGING_API_KEY env

Text-to-Speech

# Basic usage
/tts -t "Hello, welcome to verging.ai"

# Choose voice and save to file
/tts --text "你好世界" --voice nova --output ./speech.mp3

TTS Options

OptionShortDescriptionDefault
--text-tText to convert (max 4096 chars)Required
--voice-valloy, echo, fable, onyx, nova, shimmeralloy
--modeltts-1, tts-1-hdtts-1-hd
--format-fmp3, opus, aac, flacmp3
--speed-s0.25 to 4.01.0
--output-oSave audio file to path(URL only)
--api-key-kYour API KeyVERGING_API_KEY env

Speech-to-Text

# Basic usage
/stt -f ./meeting.mp3

# With language hint
/stt --file ./interview.wav --language zh

STT Options

OptionShortDescriptionDefault
--file-fAudio file pathRequired
--modelSTT modelwhisper-1
--language-lLanguage hint (ISO 639-1)Auto-detect
--formatjson, text, srt, vttjson
--api-key-kYour API KeyVERGING_API_KEY env

Image Generation

# Basic usage
/imagegen -p "a cat sitting on a rainbow"

# High quality, multiple images
/imagegen --prompt "product photo of red sneaker" --quality high --n 2 --output ./images/

Image Generation Options

OptionShortDescriptionDefault
--prompt-pText descriptionRequired
--modelgpt-image-1, dall-e-3gpt-image-1
--size-s1024x1024, 1024x1536, etc.1024x1024
--quality-qauto, low, medium, highauto
--n-nNumber of images (1-4)1
--output-oSave images to directory(URL only)
--api-key-kYour API KeyVERGING_API_KEY env

Vision Analysis

# Analyze local image
/vision -i ./screenshot.png -p "What errors are shown?"

# Analyze remote image
/vision --image "https://example.com/chart.png" --prompt "Summarize this chart"

Vision Options

OptionShortDescriptionDefault
--image-iImage file path or URLRequired
--prompt-pQuestion about the imageRequired
--modelVision modelgpt-4o
--max-tokensMax response tokens1024
--api-key-kYour API KeyVERGING_API_KEY env

Environment Variables

# Set your API Key
export VERGING_API_KEY="your_api_key_here"

# Optional: Set API URL (default: https://verging.ai/api/v1)
export VERGING_API_URL="https://verging.ai/api/v1"

Get API Key

  1. Visit https://verging.ai
  2. Register/Login
  3. Click your username in the top right corner
  4. Select API Keys from the dropdown menu
  5. Create a new API Key

API Keys Location

Credits

Face Swap

  • Normal mode: 1 credit/second
  • HD mode: 3 credits/second

Background Removal

  • 1 credit per image

Video Enhancement

  • Normal mode: 1 credit/second
  • HD mode: 3 credits/second

Chat

  • Post-deduct billing based on token usage

Text-to-Speech

  • Pre-deduct billing based on character count

Speech-to-Text

  • Pre-deduct billing based on audio duration

Image Generation

  • Pre-deduct billing based on model, size, quality, and count

Vision Analysis

  • Post-deduct billing: 2 base credits + token cost

License

MIT