Skip to content Skip to main navigation Skip to footer

Search: audio supported

28 results

STT: What is Preferred Phrases feature and how to use it

…it can help in other applications, too – e.g. when transcribing domain-specific audios, the frequently used domain-specific phrases can be boosted. How preferred phrases work The picture below shows a simplified standard speech transcription process – the digitized speech signal spectrum is analyzed in the neural network acoustic model (which describes the pronunciations of a given language) and goes into…

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Speech to Text (STT)

…1 CPU core (eg. standard 8 CPU core server (8 instances of STT) can process 1010 hours of audio in 1 day of computing time (flat load, depend on technology model)) Supported languages: List of supported languages. Acoustic models Acoustic model is created by training on training data. It includes characteristics of a voices of a set of speakers provided…

Releases and Changelogs (Browser)

…(phxspe.browser.log located in SPE log directory) Phonexia Browser v3.16.1, BSAPI 3.20.1 – May 17 2019 [G#112] Fixed Denoiser which created duplicate recordings under specific circumstances [G#127] Fixed comparison of SID Evaluation sets using Audio Source Profile Phonexia Browser v3.16.0, BSAPI 3.20.0 – Apr 26 2019 Support for Audio Source Profiles SID Evaluation wizard supports SID4 Phonexia Browser v3.15.0, BSAPI 3.19.1…

STT: Results explained

…machines” vs. “eighty machines”. The technology provides various output types which show only single or multiple transcription alternatives. For processing realtime streams, two result modes are supported – one mode provides complete transcription, second mode provides incremental results. Output types One-best output provides transcription containing only the highest-scoring words N-best output provides multiple alternatives for entire sentences or longer sequences…

Waveform Denoiser (DENOISER)

…Speech Engine documentation); stream not supported, technology model name to be used for processing. Output: audio file (WAV or RAW), together with xml/json report (in SPE only). Fig.: Comparison of original recording (david_noisy.wav, top half of image) and same recording processed by Denoiser (david_denoised.wav, bottom half of the image). Typical Questions Q: What do you recommend for deploying this technology?…

Releases and Changelogs (VIN)

…1.3 2015-06-04 2016-12-04 2016-12-04 Public Changelogs Voice Inspector 5.2 Voice Inspector 5.2.0, BSAPI 3.61.0 (2024-04-04) New: New Case wizard checks for presence of Questioned and Reference recordings New: Number of audio channels is displayed in Case view Recording details view Score table view Report Fixed: Application crash with phoneme search Fixed: Generalized logistic distribution for Suspected speaker vs. Suspected speaker…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Download Speech Platform

…only English models for Speech To Text and Keyword Spotting. Additional supported languages are available upon request. ⓘ Click to show/hide the package content Speech Engine – technologies included: Speech To Text (STT) – model EN_US_6 (US English) Keyword Spotting (KWS) – model EN_US_6 (US English) Phoneme Recognizer (PHNREC) – model EN_US_6 (US English) Speaker Identification 4 (SID4) – model…

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio

SID4 performance on Intel® Xeon® Platinum 8124M

…w/o speech context) Methodology SID4 performance was measured on a virtual machine, Ubuntu 18.04 installed as host OS. SID4 v 3.21.3 command line was used, supported by VAD 3.22.1 command line used for collecting statistical metadata. The Virtual Machine was reserved only for this measurement experiment. Technical details: Driven by bash script in terminal emulator Measuring script was run 50…

Understand SPE technologies configuration file

…to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc. To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin…