Skip to content Skip to main navigation Skip to footer

Search: transcription results

16 results

STT: Results explained

transcription is started using result_mode=incremental parameter in the request. In this mode, each request for transcription results returns only changes since the last request for results. In incremental mode, the received results may correct results received previously, e.g. when one request was sent in a middle of a word, the next request contains a correction, i.e. the correct entire word….

Releases and Changelogs (Browser)

…now load transcription files that contain spaces in a word instead of ‘+’ signs Fixed: Wrong file suffix when saving transcription on Windows Phonexia Browser 3.59 (Public release) Phonexia Browser 3.59.0, BSAPI 3.59.0 (2023-06-20) New: Transcription can be saved in text formats supported by the transcription widget Improved: SPE Output widget is now visible by default and gets focused when…

Releases and Changelogs (SPE)

…to the change in GID results content, all GID results will be removed from cache (database) during update! Speech Engine 3.17.3 (08/22/2019) – DB v1200, BSAPI 3.21.3 [G_#191] Fixed: KWS getting phonemes/graphemes in specific circumstances returns unknown error [G_BSAPI#413] Fixed: duplicated output from KWS Speech Engine 3.17.2 (08/02/2019) – DB v1200, BSAPI 3.21.2 [G_BSAPI#300] Fixed: KWS stream results are displayed…

Speech to Text (STT)

…symbols which will never be output from speech transcription system. Numbers: transcription always include „thirteen“ instead of „13“, which can occur in the annotation Parentheses; transcription – „parentheses“, annotation „( )“ Characters of national alphabets; transcription – only limited alphabet, annotation „ěščřžů,…“ Data in annotation needs to be processed to include only characters allowed in transcription to avoid quality impact….

Release Notes

…BROWSER Update We finished small but important improvements: The Age column in Results pane now shows the numeric results instead of age groups; column name changed to Age (±10 years) to emphasize the results tolerance Added the Keyword Spotting highest confidence column in Results pane, showing the highest confidence value of all detected keywords in a recording (allowing to judge…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

FAQs (PSP)

…In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages do you offer? It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT)…

STT: What is Words-To-Numbers feature and how to use it

…variants are provided), for both file- and stream transcription. The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output: two… 2 two thousand… 2000 two thousand twenty… 2020 two thousand twenty one 2021 And…

Keyword Spotting (KWS)

…Keyword Spotting. It’s a good idea to limit the start- and end time of Phoneme Recognizer transcription to only the time slot where the word or phrase of interest occurs. Thresholds Threshold is a numeric value from {0,1} interval, limiting the output results. Only words with confidence exceeding the threshold are returned as result. Command line implementation of Keyword Spotting…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

Input audio quality

Quality of the audio is extremely important for satisfactory results of any speech processing technology, being it simple voice activity detection, speech transcription, voice biometry, or other. There are two main aspects of audio quality: technical quality of the audio data (format, codec, bitrate, SNR, …) sound quality of the actual content (background noise, reverberations, …) Technical quality Using inappropriate…

FAQs (Browser)

…recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results. Warning: Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it’s very important to prepare the data carefully.   See SID Evaluation for more details in…

STT: Language Model Customization tutorial

…as a source and creates new STT model with your customizations included as a target. To see results of the customizations, you need to use the new STT model for the transcription. Currently supported language model customizations are: adding new words and/or pronunciations This is intended for adding client-, domain- or product specific words like company names, product names, component…