Building the Fastest YouTube Video Summarizer in the World
We discuss various approaches to building a high-performance YouTube video summarizer. Some take visual elements into account, while others focus on audio.
/blog-assets/authors/akshara.jpeg
by Akshara Soman
Cover Image for Building the Fastest YouTube Video Summarizer in the World

The ability to quickly summarize media content has become one of the most powerful use cases of LLMs over the past two years. With Sieve's powerful prebuilt functions, you can create a fast and efficient video summarizer, enabling users to grasp key ideas without watching the entire video. In this blog we’ll discuss how to leverage Sieve's tools to build a high-performance YouTube video summarizer with minimal effort and code.

To get started, create a Sieve account and install the Python package.

Downloading YouTube videos

You can easily download a YouTube video using our built-in youtube_to_mp4 function at any specified resolution.

import sieve

youtube_downloader = sieve.function.get("sieve/youtube_to_mp4")
video_url = "https://www.youtube.com/watch?v=g96eikTDXbw"
downloaded_video = youtube_downloader.run(video_url)

# print out the video path
print(downloaded_video.path)

Video Summarization

The Sieve toolkit currently features two prebuilt functions that support video summarization: visual-qa and transcript-analysis. Both are designed for different use cases and support plain text and structured JSON outputs for seamless integration into workflows.

Visual QA

The visual-qa function analyzes both the visual and audio components of video content and can generate summaries through tailored prompts.

visual_qa = sieve.function.get("sieve/visual-qa")
summary_prompt = "Provide a detailed summary of the video"
summary = visual_qa.run(
    downloaded_video,
    "gemini-1.5-flash",
    summary_prompt,
    fps=1,
    audio_context=True
)
print(summary)

Transcript Analysis

The transcript-analysis function transcribes the audio in the video and has preset prompts that output titles, chapters, and summaries with ability to stylize and change output length.

transcript_analysis = sieve.function.get("sieve/transcript-analysis")

transcription_backend = "groq-whisper"
llm_backend = "gpt-4o-2024-08-06"
max_summary_length = 20

summary = transcript_analysis.run(
    downloaded_video,
    transcription_backend,
    llm_backend,
    prompt="",
    generate_summary = True,
    generate_title = False,
    generate_tags = False,
    generate_chapters = False,
    max_summary_length = max_summary_length
)

for summary_object in summary:
    print(summary_object)

How fast can we summarize?

Let’s examine how Sieve’s visual-qa and transcript-analysis functions perform when summarizing two distinct types of videos:

Below is a table illustrating the time taken by each function to summarize the videos:

Video Length Transcript Analysis Visual QA
When 1 Pilot Fought 64 Japanese Planes 10 minutes 12s 2m 15s
CS50's AI Lecture by Prof. David Malan 60 minutes 50s 5m 20s

Summary Comparison

To compare the quality of the generated summaries for the 10-minute video, read the summaries below:

  • sieve/transcript-analysis: In this video, viewers are transported to December 13th, 1943, as they follow 2nd Lieutenant Philip R. Adair, who finds himself alone against a formidable force of over 60 Japanese planes during World War II. Adair, piloting his P-40 aircraft, Lulubelle, decides to take decisive action despite the overwhelming odds to protect his home base, which includes hospitals. Executing high-risk attacks on bombers from an advantageous high position, he manages to disrupt their mission, though the bombers succeed in releasing their bombs off-target. With fearsome fighter planes known as Zeros attacking, Adair employs skillful escape maneuvers to survive. In a series of daring decisions, he engages both bombers and fighters until his ammunition runs out, yet manages to down several enemy planes. Adair narrowly escapes multiple attacks and even when faced with mechanical difficulties after his plane is hit, he resourcefully keeps the aircraft airborne using inverted flying techniques. His damaged plane perseveres through further threats as he heads back towards his base, narrowly avoiding crashes and even enemy pursuit. Ultimately, Adair masterfully lands his plane in a precarious upside-down state, bringing it back safely to base. This extraordinary mission, his 44th, earns him the Silver Star. The video credits Robin and Luke Adair for their contributions and invites viewers to check out more on the Burma Banshees' website. It concludes with a tribute to the squadron and provides information on where to purchase related merchandise.
  • sieve/visual-qa: On December 13th, 1943, 2nd Lieutenant Philip R. Adair was tasked with scrambling after spotting what appeared to be four enemy planes. As he got closer, Adair realized it was not four planes, but four flights of six bombers each, all escorted by fighters. Adair was outnumbered 64 to 1 and faced almost certain death, but he pushed that thought aside and headed straight for the enemy. He knew he had to take down the bombers first, and he decided to approach them from high above on the left, trying to get a few shots off. Adair started firing at long range in order to shake them up, and the bombers, unnerved by the tracers, started bouncing up and down. The fighters were not paying attention to Adair at this time, and soon he was behind the last flight and was able to pick off the last bomber, aiming at its left engine. He saw hits on the engine and what looked like fire, but he overtook the bomber so quickly he couldn't be sure. Adair then had to make a quick escape maneuver, pulling the stick forward and rolling into a negative G outside roll, causing a high-speed dive. Thankfully, the zeroes were returning to defend the bomber formation. The bomber crew, though rattled by Adair's attack, completed their mission, but their bombs fell short of the target area. Adair, after a successful escape, took another run at the bombers, but realized he was out of ammunition, and so he went by the last bomber on the right, only to find himself targeted by it. The bomber pulled up beside Adair and began firing at him, but missed and Adair escaped and headed home. Adair was 100 miles away from home when he realized he was having trouble maintaining his altitude, and so he decided to try an unusual strategy, flipping his plane upside down. This worked for a while, and he gained some altitude, but the engine started cutting out, and Adair flipped the plane back over. The engine started running smooth, and Adair continued home. Adair landed upside down, dropping the landing gear at the last minute. He later remarked that Mission #44 was the most exciting mission he had ever flown, and that he wouldn't be able to stand that much excitement again, but he was convinced it was his lucky number as it was painted on the side of his plane, Lulu Belle. Adair received the Silver Star for his actions that day.

Key Insights:

  • The transcript-analysis function summarizes a 10-minute video in just 12 seconds and a 1-hour lecture in 50 seconds. This makes it the ideal choice for developers focused on speed or working with podcast or lecture-style content that’s primarily audio focused.
  • While slower, visual-qa function’s ability to interpret visual elements makes it a better choice for applications where storytelling, visual context, or scene-by-scene analysis is critical.

Sieve offers flexibility, allowing you to prioritize based on your application’s needs—whether it’s speed, visual detail, or both. If you have any thoughts or comments, please feel free to reach out to us by joining our Discord community or email us at contact@sievedata.com.