Skip to content Skip to main navigation Skip to footer

Search: task

20 results

FAQs (PSP)

…a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise has Gaussian distribution. So we can estimate the SNR by…

Testing possibilities

…the real production data. In case the integration into a contact center is not a straightforward task and the testing needs be done very quickly, Phonexia is able to provide a complete testing environment including the Voice Verify evaluation package, open source Asterisk IP PBX (https://www.asterisk.org/) and a guide on how to use them together with any kind of softphone…

Documentation (SPE)

…files in [SPE]/doc in standard software package and installation. You can also find REST API reference (Speech Engine) documentation online. You might be interested in reading the following information in manual: REST API reference Structure of API queries Asynchronous request Task prioritization Authentication Audio requirements RTP/HTTP streams Error responses API Commands Usage examples API Requirements Installation guide And much more…

Speaker Identification (SID)

…technical capabilities of text-independent speaker recognition. The objective is to drive the technology forward and through the competing find the most promising algorithmic approaches for our future production-grade technology. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker…

Q: How do you calculate SNR in Speech Quality Estimation?

A: Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise…

STT: Adding words to language model on the fly

…word being ignored during transcription (see the warning_message parameter below). Transcription result If preferred phrases and/or words were specified when starting the transcription, the result contains the same phrases and dictionary structures which were used as input for the transcription task. The dictionary structure is enriched with pronunciations part, generated automatically for words which did not specify pronunciations in the…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…configurations. And vice versa – using the same metric, you can compare software from different vendors on the same HW configuration and for the same processing task. We recognize two measurable metrics: Audio based FtRT is calculated from actual audio in its original form, i.e. containing parts with spoken speech and also parts with silence or other non-speech signal (background…

Q: How do I get results for a pending operation?

A: If server responds on pending request by status 200 – OK, the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on pending request by status 202 – Accepted, server will create task and server will begin to…

Orbis 1.3.2 Release Notes

Speaker clustering We have added new feature that helps uncover situations, when the same speaker is speaking on multiple unrelated devices. Speaker clustering feature groups recording into clusters, where each cluster represents one speaker Notification center We have added a notification center to monitor background tasks. It can be accessed by clicking a bell icon on the top left corner…

Sizing of the computing units for speech technologies

…VT features can’t help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation are the best. For small computation tasks, i7 family CPUs also have reasonable price/performance ratio) Big challenge: correct SPE3/Speech platform…

Release Notes

…consuming) from audio only once and sent for comparison (fast) to both SID4_XL4 and GID_XL4. VAD has been upgraded to a new generation (tech. model GENERIC_3) The model (GENERIC_3) was released for standalone Voice Activity Detection (VAD as part of SPE). It brings higher accuracy in such a fundamental task to recognize speech and non-speech (silence, ringing, etc.) correctly. Using…

LID: Terminology and adaptation

…to train a language using just a few and long audio files (like 5 files, 1 hour each) Acoustic channels should be as close as possible to channel of intended deployment Adaptation using REST API (SPE 3.38 or newer) SPE 3.38 and newer include LID adaptation tasks in REST API, which makes the adaptation significantly easier than in previous versions….

STT: What is Preferred Phrases feature and how to use it

Preferred phrases is a feature, available for 5th or newer generation of STT models and Speech Engine 3.32 or later. This article explains what is the feature good for, how does it work internally and gives some tips for practical implementation. What are preferred phrases In the speech transcription tasks, there may be situations where similarly sounding words get confused,…

Understand SPE user accounts

…name and password (obviously 😉) whether the account is active or not – accounts may be turned off and on user role – any combination of user – allowed to use all SPE functionality, except for the /admin/… endpoints admin – allowed to use all SPE functionality, including the /admin/… endpoints prioritize – allowed to prioritize SPE tasks, see Task

Understand SPE workers configuration

…the maximum number of simultaneously running tasks. # Multithread settings server.n_workers = 8 server.n_realtime_workers = 8 Requests for additional file processing tasks are put in a queue and processed according their order and priorities. Requests for additional stream processing tasks are refused with HTTP status 403 (the realtime nature of stream processing does not allow any queuing). File processing can…