Production-grade AI solutions for your applications
Translate any video or audio content with natural sounding translations and voices.
sieve
/
dubbing
Smart, automatic cropping of a video to a given aspect ratio based on subject detection and speaker tracking.
autocrop
A comprehensive solution for video lipsyncing with a suite of different model and enhancements options.
lipsync
State-of-the-art audio-visual active speaker detection based on new, efficent face and speaker detection models.
active_speaker_detection
Filters for removing background noise, enhancing speech, and more in audio files.
audio-enhance
Fast, high quality speech transcription with many available backends, word-level timestamps, speaker diarization, and translation capabilities.
transcribe
High-quality background removal for images and videos.
background-removal
Moderate videos and images for harmful content.
visual-moderation
Detect and crop unwanted borders such as black bars from videos.
border-detection
Detect scene transitions in a video
scene-detection
Production-grade video processing utilities for your applications
YouTube downloader, download videos, audios, subtitles, and metadata at scale.
youtube-downloader
Optimized AI models for your applications
This is an optimized implementation of Segment Anything 2, a model that can dynamically segment objects in an image or video.
sam2
An active speaker detection model to detect which people are speaking in a video.
talknet-asd
High-quality speech recognition using major improvements on top of Whisper
whisper
Demucs is a state-of-the-art music source separation model, currently capable of separating drums, bass, and vocals from the rest of the accompaniment.
demucs
Resemble Enhance is an AI-powered tool that aims to improve the overall quality of speech by performing denoising and enhancement
resemble-enhance
Diarize audio using pyannote-audio
pyannote-diarization