Search: transcription%20results

23 results

Speech to Text (STT)

…symbols which will never be output from speech transcription system. Numbers: transcription always include „thirteen“ instead of „13“, which can occur in the annotation Parentheses; transcription – „parentheses“, annotation „( )“ Characters of national alphabets; transcription – only limited alphabet, annotation „ěščřžů,…“ Data in annotation needs to be processed to include only characters allowed in transcription to avoid quality impact….

STT: What is Preferred Phrases feature and how to use it

…due to differences in pronunciations of certain graphemes in the transcription language. Therefore it is recommended to define words pronunciations in the dictionary part described below. Complete list of the preferred phrases words and all their pronunciations is included in the transcription result for reference. This is useful for analysis of transcription errors and eventual tuning of words- or pronunciations…

Releases and Changelogs (SPE)

…STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder) TR_TR_6 SK_SK_6 FA_6 Improved: Models for STT and KWS, updated and aligned with CS_CZ_6 features (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder) AR_XL_6 SV_SE_6 HR_HR_6 EN_US_6 PS_6 Fixed:…

STT: Results explained

…is supported in realtime streams. Complete mode This is the default mode selected for returning transcription of realtime stream, if no other mode is explicitly selected when starting the transcription, or when the result_mode=complete parameter is used. In this mode, each request for transcription results returns the complete transcription since the beginning. Incremental mode This mode is used if the…

STT: Adding words to language model on the fly

…word being ignored during transcription (see the warning_message parameter below). Transcription result If preferred phrases and/or words were specified when starting the transcription, the result contains the same phrases and dictionary structures which were used as input for the transcription task. The dictionary structure is enriched with pronunciations part, generated automatically for words which did not specify pronunciations in the…

Releases and Changelogs (Browser)

…now load transcription files that contain spaces in a word instead of ‘+’ signs Fixed: Wrong file suffix when saving transcription on Windows Phonexia Browser 3.59 (Public release) Phonexia Browser 3.59.0, BSAPI 3.59.0 (2023-06-20) New: Transcription can be saved in text formats supported by the transcription widget Improved: SPE Output widget is now visible by default and gets focused when…

Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)?

A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition. The dictate engine, also known as on-the-fly transcription, does not look to the future and has information about just a few…

Release Notes

…CS_CZ_6 (Czech) Custom words (not present in the baseline STT model – such as names, slang expressions, etc.) can now be used in preferred phrases. On top of that, this feature can replace the LMC functionality (add custom words dynamically with each transcription attempt with no permanent STT models created). Improved transcription accuracy in the 6th generation of STT –…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

FAQs (PSP)

…In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages do you offer? It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT)…

STT: What is Words-To-Numbers feature and how to use it

…variants are provided), for both file- and stream transcription. The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output: two… 2 two thousand… 2000 two thousand twenty… 2020 two thousand twenty one 2021 And…

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

Voice Inspector – supporting technologies

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

Key Features (VIN)

…Speaker Identification, Speaker Diarization, Phoneme Recognizer, Voice Activity Detection, Speech Quality Estimation A search for repetitive sound patterns across all recordings in audio due to the automatic phonemic transcription Input: Questioned recordings (a minimum of 1 recording) Suspected speaker recordings (a minimum of 1 recording) The Population set (a technical minimum of 10 speakers, and a recommended minimum of 50…

Recommended OS and HW (PSP)

…or 10th Gen Intel® Core Processor RAM: 16 GB Storage: 100 GB (depends on audio retention policy) SSD strongly recommended for superior performance over HDD Configuration includes: SID4 XL4, GID XL4, LID L4, AGE L4, VAD, SQE Transcription System, basic 100 hours/day package (***) files processing CPU: 8 physical cores, 1x Intel® Xeon E5-2640 v4 or similar or 10th Gen…