Skip to content Skip to main navigation Skip to footer

Search: streamed_task_maximum

26 results

Speech Engine update

…of software and/or API (for example REST Server 2.1 -> SPE 3.0). It includes change in components or technology models. Speech Engine update procedure The update procedure is purely manual and heavily relies on your own detailed knowledge of your Speech Engine installation and its internal functionality and structures. This knowledge is crucial for tuning the Speech Engine for maximum

Recommended OS and HW (PSP)

…performance over HDD Configuration includes: SID4 XL4, GID XL4, LID L4, AGE L4, STT 6th generation – 2 languages (half load each), KWS 6th generation – 2 languages, VAD, SQE (***) The amount of hours/day refers to the Phonexia pricing package, it does NOT mean maximum throughput of such configuration. In other words, this is recommended configuration, not minimal configuration….

Understand SPE metafiles

…can store whichever type of data would help your application. The files are physically stored in the SPE user’s “home”, in data subdirectory (see Understanding SPE home directory article for details). Maximum size of single metafile can be set using server.max_metadata_size setting in SPE cofiguration file. Example As an example, the picture below shows how Phonexia Browser uses SPE metafiles…

Input audio quality

…with speech in mind) Lossy MP3 format is not preferred. If MP3 really has to be used, it must use bitrates at least 32 kbit/s per channel. Stereo audio must use full stereo encoding, not joint-stereo1. Do not push for smallest possible audio file sizes, attempting to squeeze maximum number of recordings into a minimal storage space. Brutal compressions like…

Q: How do I get results for a pending operation?

A: If server responds on pending request by status 200 – OK, the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on pending request by status 202 – Accepted, server will create task and server will begin to…

Sizing of the computing units for speech technologies

…VT features can’t help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation are the best. For small computation tasks, i7 family CPUs also have reasonable price/performance ratio) Big challenge: correct SPE3/Speech platform…

Speaker Identification (SID)

…technical capabilities of text-independent speaker recognition. The objective is to drive the technology forward and through the competing find the most promising algorithmic approaches for our future production-grade technology. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker…

Q: How do you calculate SNR in Speech Quality Estimation?

A: Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…configurations. And vice versa – using the same metric, you can compare software from different vendors on the same HW configuration and for the same processing task. We recognize two measurable metrics: Audio based FtRT is calculated from actual audio in its original form, i.e. containing parts with spoken speech and also parts with silence or other non-speech signal (background…

Documentation (SPE)

…files in [SPE]/doc in standard software package and installation. You can also find REST API reference (Speech Engine) documentation online. You might be interested in reading the following information in manual: REST API reference Structure of API queries Asynchronous request Task prioritization Authentication Audio requirements RTP/HTTP streams Error responses API Commands Usage examples API Requirements Installation guide And much more…

STT: Adding words to language model on the fly

…word being ignored during transcription (see the warning_message parameter below). Transcription result If preferred phrases and/or words were specified when starting the transcription, the result contains the same phrases and dictionary structures which were used as input for the transcription task. The dictionary structure is enriched with pronunciations part, generated automatically for words which did not specify pronunciations in the…