Skip to content Skip to main navigation Skip to footer

Search: audio format

40 results

Testing possibilities

…(GSM, VoIP,…) Microphone placement (close-field vs. far-field) Audio quality Formats Codecs Background noise Geological locations Age distribution Style of speech Monolog vs. dialog Reading a text vs. live conversation In some of the scenarios mentioned above, it is quite difficult to assure all of these requirements, that is the reason why the best option for accuracy testing is definitely in…

Time Analysis Extraction (TAE)

…dialogue. This can be used to improve calls between operators and callers or to indicate potential stress points in phone calls, for example, change of speech speed during the conversation). Input TAE can process both audio files and streams (for format details see Speech Engine documentation). By its nature, TAE is usable mainly on two channel phone calls recordings, where…

STT: Results explained

…These can be recognized by recording-level confidence value of -1. “one_best_result”: { “confidence”: -1, “segmentation”: [ …   N-best output { “phrase”: “can you hear me okay i wanted to”, “channel”: 0, “score”: 509.71384, “confidence”: 0.33733934, “start”: 1500000, “end”: 28200000 } This format can be used by analytical applications to process further the alternatives. It can be also useful when…

Understand SPE technologies configuration file

…to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc. To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin…

Q: What are the requirements for SID evaluation dataset?

…in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution) audio files should be mono, lin16 format, 8 kHz+ sample rate *Note: splitting single recording into multiple shorter…

Waveform Denoiser (DENOISER)

…software cannot remove unwanted speech or music in the background. Denoiser is used to remove noise from the recording and at the same time to amplify the speech signal for: Better intelligibility when listening by people (recommended use), Achieving better results with automatic speech recognition technologies (necessary to test on customer data first). Input: audio file (format details – see…

Speech Quality Estimation (SQE)

…channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

Orbis 1.1.0 Release Notes

…Recording metadata formats Orbis doesn’t support metadata files in proprietary formats. Only Orbis JSON format is supported for metadata upload in version 1.0. Solution: Convert your proprietary metadata format into the specified JSON format. Hit feature Due to the performance issues, the Hits are automatically calculated only on recording upload. When a new rule is defined the Hits recalculation is…

Orbis 1.2.0 Release Notes

…number of recordings for analysis. A FIFO algorithm is used to automatically remove the older entries. Limitations (known issues) Recording metadata formats Orbis doesn’t support metadata files in proprietary formats. Only Orbis JSON format is supported for metadata upload in current version. Solution: Convert your proprietary metadata format into the specified JSON format. Hit feature Due to the performance issues,…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

FAQs (Voice Verify)

…requirements? A: Please note that audio recordings can only be used for enrollment, not for verification. Voice Verify expects the audio coming from the user’s device to originate from a relatively calm environment. The user should avoid extensive background noise such as loud music, a street with heavy traffic, etc. The agent should warn the user during the enrolment call…

Speaker Identification (SID)

…are monitoring a large number of audio recordings or streams and we are looking for the occurrence of a specific speaker(s). Speaker spotting can be deployed for the purpose of Fraud Alert. Speaker Verification is the case when we are asking “Is this Peter Smith’s voice?”, such as when a person calls the bank and says, “Hello, this is Peter…