YouTube’s auto-dubbing tool democratizes access to multilingual content creation. We discuss its current capabilities and limitations, and how businesses should strategize around it.
A comprehensive comparison of video OCR solutions using modern AI models like Gemini and Florence 2 versus traditional OCR approaches, with practical implementation guides and performance metrics.
A comprehensive guide to implementing robust ball tracking in sports videos using SAM 2, with practical solutions for handling scene changes, false positives, and dynamic camera movements.
We discuss various approaches to building a high-performance YouTube video summarizer. Some take visual elements into account, while others focus on audio.
We discuss a new pipeline for removing backgrounds from video that offers high-quality outputs on complex scenes as well as a fast option for simpler videos.
A practical guide to removing background noise from videos using traditional signal processing, advanced AI models for noise suppression, and intelligent source separation methods.
We discuss a new zero-shot lipsync pipeline built with MuseTalk, LivePortrait, and CodeFormer designed to preserve more realism than existing solutions.
Learn about Meta's SAM 2 (Segment Anything Model 2) and how Sieve's optimized implementation runs 2x faster. Explore use cases, benchmarks, and how to use SAM 2.
In this blog, we dive into MuseTalk, a state-of-the-art zero-shot lipsyncing model. We cover how it works, its pros and cons, and how to run it on Sieve.
In this post, we discuss the commoditization of audio transcription and a new Sieve offering around it that is 5x cheaper than other providers while still maintaining speed and accuracy.
We discuss current lip syncing solutions such as OpenRetalker’s Video Retalking and SieveSync to get a performant, production-ready lipsyncing solution.
Learn how to leverage an AI audio enhancement app with open-source for your vlogs and other media, rivaling the best APIs in the market. Try it for yourself!
In this blog post, we go through the process of generating video chapter titles with OpenAI's Whisper + GPT-3 models and an open-source text segmentation technique!
Learn about the specialized pipelines in the Sieve toolkit for creating realistic AI avatars, including Portrait Avatar, LivePortrait, and Lipsync. This blog provides a detailed discussion of strengths, limitations, and use cases.