Skip to content Skip to main navigation Skip to footer

Search: transcription results

16 results

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

STT: Language Model Customization tutorial

…as a source and creates new STT model with your customizations included as a target. To see results of the customizations, you need to use the new STT model for the transcription. Currently supported language model customizations are: adding new words and/or pronunciations This is intended for adding client-, domain- or product specific words like company names, product names, component…

Orbis 1.4.0 Release Notes

…model. Speech to Text in Orbis New editions of Orbis Investigator may include also Speech to Text technology. This technology enables converting audio into the text for better and faster understanding of the content. Box with transcribed text is straight under the recording itself. Limitation: Transcription of text is provided for one chosen language per one Orbis instance. Search for…

STT: What is Words-To-Numbers feature and how to use it

…variants are provided), for both file- and stream transcription. The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output: two… 2 two thousand… 2000 two thousand twenty… 2020 two thousand twenty one 2021 And…

Input audio quality

Quality of the audio is extremely important for satisfactory results of any speech processing technology, being it simple voice activity detection, speech transcription, voice biometry, or other. There are two main aspects of audio quality: technical quality of the audio data (format, codec, bitrate, SNR, …) sound quality of the actual content (background noise, reverberations, …) Technical quality Using inappropriate…

Keyword Spotting (KWS)

…Keyword Spotting. It’s a good idea to limit the start- and end time of Phoneme Recognizer transcription to only the time slot where the word or phrase of interest occurs. Thresholds Threshold is a numeric value from {0,1} interval, limiting the output results. Only words with confidence exceeding the threshold are returned as result. Command line implementation of Keyword Spotting…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

FAQs (PSP)

…In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages do you offer? It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT)…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

Speech to Text (STT)

…symbols which will never be output from speech transcription system. Numbers: transcription always include „thirteen“ instead of „13“, which can occur in the annotation Parentheses; transcription – „parentheses“, annotation „( )“ Characters of national alphabets; transcription – only limited alphabet, annotation „ěščřžů,…“ Data in annotation needs to be processed to include only characters allowed in transcription to avoid quality impact….

Release Notes

…BROWSER Update We finished small but important improvements: The Age column in Results pane now shows the numeric results instead of age groups; column name changed to Age (±10 years) to emphasize the results tolerance Added the Keyword Spotting highest confidence column in Results pane, showing the highest confidence value of all detected keywords in a recording (allowing to judge…

Releases and Changelogs (SPE)

…of all word confidences in a sentence – helps in judging the results ‘credibility’ Reduced delay of obtaining results in output – allows for faster detection of barge-in, e.g. in voicebot application New: All 5th generation STT models now use Minimum Bayes-Risk Decoding for Confusion Network construction Confusion Network results now contain precise start- and end times for each individual…

Releases and Changelogs (Browser)

…now load transcription files that contain spaces in a word instead of ‘+’ signs Fixed: Wrong file suffix when saving transcription on Windows Phonexia Browser 3.59 (Public release) Phonexia Browser 3.59.0, BSAPI 3.59.0 (2023-06-20) New: Transcription can be saved in text formats supported by the transcription widget Improved: SPE Output widget is now visible by default and gets focused when…