Job TreeNavigate the job tree to view your child job details
Loading job tree...
State-of-the-art active speaker detection based on new, efficent face and speaker detection models.
waiting for outputs
listening for logs...


This repository is an optimized, production-ready implementation of active speaker detection. Read more about the research area here.

It contains of two parts:

  • The open-source implementation of the active speaker detection application that runs on the Sieve platform.
  • The standalone, optimized implementation of TalkNet, a leading model for active speaker detection.

The TalkNet implementation significantly improve on the original primarily from the perspective of performance. The pre-processing and post-processing steps are faster and it support variable frame-rate videos (not just 25 FPS like the original). The active speaker detection implementation is a further productionized version of this that parallelizes processing through TalkNet and a separate standalone face detection model to provide faster, higher-quality speaker tracking and detection results.



If you plan to just use the standalone implementation of TalkNet, follow the steps below:

  1. go to the talknet directory
  2. run pip install -r requirements.txt
  3. run python

You can change the input video file being used by modifying the main function in

Active Speaker Detection

The easiest way to run active speaker detection is to use the version already deployed on the Sieve platform available here.

While the core application can be run locally, it still calls public functions available on Sieve, such at the YOLO object detection model so you will need to sign up for a free account and get an API key. You can do so here.

After you've signed up and run sieve login, you can run from the root directory of this repository to run the active speaker detection application.

© Copyright 2024. All rights reserved.