Skip to content Skip to main navigation Skip to footer

Search: audio supported

37 results

Download Speech Platform

…only English models for Speech To Text and Keyword Spotting. Additional supported languages are available upon request. ⓘ Click to show/hide the package content Speech Engine – technologies included: Speech To Text (STT) – model EN_US_6 (US English) Keyword Spotting (KWS) – model EN_US_6 (US English) Phoneme Recognizer (PHNREC) – model EN_US_6 (US English) Speaker Identification 4 (SID4) – model…

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio

Orbis 1.1.0 Release Notes

…case number, suspects etc. Maximal speech length for the voiceprint extraction To optimize performance, we used a constraint on the total length of speech captured during voiceprint extraction. This allows the Orbis system to perform more extractions per minute, especially for the longer audio recordings. Only one channel processing To optimize performance, the option of processing only one channel (out…

Releases and Changelogs (VIN)

…can more accurately model a skewed distribution. Such skewed distributions are being produced by the modern highly accurate speaker identification system of Phonexia. Fixed: Audio files with Unicode characters in their name can be also opened on Windows. Changed: The histogram bins in the probability density function plot are now normalized. Changed: The case description (field in the information table…

Orbis Hardware Requirements

…for Orbis VM Memory 32 GB for Orbis VM Disk space 80 GB SSD typical installation of the Orbis does not exceed 20 GB depending on the size, length, and retention policy of your files, you may want to allocate more space* *1 MB of uploaded audio = 2.7 MB of storage needed Virtual Platform (hypervisor) minimal: VirtualBox 6.1.30 64-bit…

Orbis 1.2.0 Release Notes

…size grows to 10,000 files We increased the maximum case storage size from 1k to 10k recordings. Flow-through Case In certain cases, there is a need to handle a continuous stream of high audio loads without having to store everything within the system. Now, you can switch such a Case to the Flow-through Case mode and upload a practically unlimited…

SID4 performance on Intel® Xeon® Platinum 8124M

…w/o speech context) Methodology SID4 performance was measured on a virtual machine, Ubuntu 18.04 installed as host OS. SID4 v 3.21.3 command line was used, supported by VAD 3.22.1 command line used for collecting statistical metadata. The Virtual Machine was reserved only for this measurement experiment. Technical details: Driven by bash script in terminal emulator Measuring script was run 50…

Understand SPE technologies configuration file

…to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc. To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin…

STT: Results explained

…machines” vs. “eighty machines”. The technology provides various output types which show only single or multiple transcription alternatives. For processing realtime streams, two result modes are supported – one mode provides complete transcription, second mode provides incremental results. Output types One-best output provides transcription containing only the highest-scoring words N-best output provides multiple alternatives for entire sentences or longer sequences…

Download Semantic Search demo

…an Ubuntu-based Linux operating system with a GUI. Supported languages Supported languages ISO Name ISO Name ISO Name af Afrikaans ht Haitian_Creole pt Portuguese am Amharic hu Hungarian ro Romanian ar Arabic hy Armenian ru Russian as Assamese id Indonesian rw Kinyarwanda az Azerbaijani ig Igbo si Sinhalese be Belarusian is Icelandic sk Slovak bg Bulgarian it Italian sl Slovenian…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Waveform Denoiser (DENOISER)

…Speech Engine documentation); stream not supported, technology model name to be used for processing. Output: audio file (WAV or RAW), together with xml/json report (in SPE only). Fig.: Comparison of original recording (david_noisy.wav, top half of image) and same recording processed by Denoiser (david_denoised.wav, bottom half of the image). Typical Questions Q: What do you recommend for deploying this technology?…

Orbis 1.4.0 Release Notes

…model. Speech to Text in Orbis New editions of Orbis Investigator may include also Speech to Text technology. This technology enables converting audio into the text for better and faster understanding of the content. Box with transcribed text is straight under the recording itself. Limitation: Transcription of text is provided for one chosen language per one Orbis instance. Search for…

STT: What is Preferred Phrases feature and how to use it

…it can help in other applications, too – e.g. when transcribing domain-specific audios, the frequently used domain-specific phrases can be boosted. How preferred phrases work The picture below shows a simplified standard speech transcription process – the digitized speech signal spectrum is analyzed in the neural network acoustic model (which describes the pronunciations of a given language) and goes into…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…