No Examples Found
Speech Transcription
This app can precisely transcribe audio data, with additional options for auto-translation.
IMPORTANT (August 30, 2024): This function will significantly change over the next three weeks. If you have integrated this into your production code, please fix the version by passing it in as part of the function name to either the API or the SDK. We have posted an example function slug with version specifications below.
sieve/speech_transcriber:86b4f1f
For pricing, click here.
For detailed notes, click here.
Key Features
- Word-level Timestamps: Provides precise timestamps for each word in the transcript (not available in
groq-whisper
). - Speaker Diarization: Identifies and labels different speakers in the audio.
- Speed Boost Option: Accelerates transcription speed with a slight trade-off in accuracy.
- Model Backend Options: Choose from various backend models to balance cost, quality, and speed.
- Auto Translation: Dynamically translates transcriptions into multiple languages.
Pricing
Note (August 30, 2024): The pricing will change in the coming weeks. You can check the price of your job via the usage table or via API.
Pricing for this function is compute based.
As an estimate, word level timestamps enabled on a stable-ts backend cost us $0.15 / hr of audio, extrapolated from this example.
Notes
Picking the right settings
backend
Options:whisperx
: A fast transcription option available.stable-ts
: Offers more accuracy, especially in timestamps.whisper-timestamped
: Similar tostable-ts
, focuses on accurate timestamps.whisper-zero
: The highest quality option available, but also the slowest.groq-whisper
: The fastest and cheapest option available, optimized using Groq (costs ~$0.111 / hour of audio).
- Enabling
speaker_diarization
returns speaker IDs for each word in the transcript. This is useful if you want to know who said what. - Enabling
speed_boost
will use smaller models with either decoding approach. This is useful if you want to get results faster and don't mind sacrificing some accuracy.
Languages
We support 99 total languages. You may enter a language code into the language
parameter if you already know the language of the original audio. If you don't know the language of the original audio, you may leave the language
parameter blank and we will automatically detect the language of the original audio. If you want to see the full list of supported languages, you may refer to the table below.
en
(English)zh
(Chinese)de
(German)es
(Spanish)ru
(Russian)ko
(Korean)fr
(French)ja
(Japanese)pt
(Portuguese)tr
(Turkish)pl
(Polish)ca
(Catalan)nl
(Dutch)ar
(Arabic)sv
(Swedish)it
(Italian)id
(Indonesian)hi
(Hindi)fi
(Finnish)vi
(Vietnamese)he
(Hebrew)uk
(Ukrainian)el
(Greek)ms
(Malay)cs
(Czech)ro
(Romanian)da
(Danish)hu
(Hungarian)ta
(Tamil)no
(Norwegian)th
(Thai)ur
(Urdu)hr
(Croatian)bg
(Bulgarian)lt
(Lithuanian)la
(Latin)mi
(Maori)ml
(Malayalam)cy
(Welsh)sk
(Slovak)te
(Telugu)fa
(Persian)lv
(Latvian)bn
(Bengali)sr
(Serbian)az
(Azerbaijani)sl
(Slovenian)kn
(Kannada)et
(Estonian)mk
(Macedonian)br
(Breton)eu
(Basque)is
(Icelandic)hy
(Armenian)ne
(Nepali)mn
(Mongolian)bs
(Bosnian)kk
(Kazakh)sq
(Albanian)sw
(Swahili)gl
(Galician)mr
(Marathi)pa
(Punjabi)si
(Sinhala)km
(Khmer)sn
(Shona)yo
(Yoruba)so
(Somali)af
(Afrikaans)oc
(Occitan)ka
(Georgian)be
(Belarusian)tg
(Tajik)sd
(Sindhi)gu
(Gujarati)am
(Amharic)yi
(Yiddish)lo
(Lao)uz
(Uzbek)fo
(Faroese)ps
(Pashto)tk
(Turkmen)nn
(Nynorsk)mt
(Maltese)sa
(Sanskrit)lb
(Luxembourgish)my
(Myanmar)bo
(Tibetan)tl
(Tagalog)mg
(Malagasy)as
(Assamese)tt
(Tatar)haw
(Hawaiian)ln
(Lingala)ha
(Hausa)ba
(Bashkir)jw
(Javanese)su
(Sundanese)yue
(Cantonese)my
(Burmese)ca
(Valencian)nl
(Flemish)ht
(Haitian)lb
(Letzeburgesch)ps
(Pushto)pa
(Panjabi)ro
(Moldavian)si
(Sinhalese)es
(Castilian)zh
(Mandarin)