Search: process

63 results

Understand SPE processing priority

SPE has a simple built-in system of task prioritization. This allows for flexible management of processing queue, which is useful especially in mass audio processing. For example, if there is a long queue of files waiting to be processed, and one needs to urgently process another bunch of files, these files can be sent for processing using higher priority… and…

Understand SPE processing queue

…can be handled simultaneously is defined by server.n_workers for audio files processing and server.n_realtime_workers for realtime streams processing settings in SPE configuration file. This is by default set automatically, based on your hardware and software configuration – see How to configure Speech Engine workers article. The picture below demonstrates the queue processing (for the sake of simplicity, technologies assignments to…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…in our example is 36 seconds. After stripping silence, it gets 14 seconds – this means that original audio contains 38% of net speech and 62% of silence. Phonexia speech technologies analyze the entire recording, but pick only the speech segments for AI processing, i.e. the absolute processing time will be practically the same… Creating voiceprint by Speaker Identification took:…

Understand SPE configuration file

…threads fetch TCP connections from TCP queue and process them # Default is 16 server.tcp.threads = 16 Sets the size of internal pool of TCP processing threads. These processing threads take TCP connections waiting in queue (see server.tcp.queue) and process them. Default value is 16 and normally you should not need to change it. Changing the value can be useful…

Understand SPE database

…rest_model_kws KWS keyword lists – keyword list JSON data, keyword list name, owner (SPE user), technology model to which the keyword list belongs Processing results data Tables containing cached processing results (if results caching is enabled): rest_result_age AGE processing results – file, used technology model, results JSON data rest_result_diar DIAR processing results – file, used technology model, used processing parameters,…

Releases and Changelogs (SPE)

…some time after task is finished Minor fixes in documentation Speech Engine 3.3.0 (07/11/2016) – BSAPI 3.6.1 Phonexia Server renamed to Speech Engine Fixed some pending operations are not processed until new pending operation is created Fixed early access to stream SID result may cause server crash Fixed check if user is active during authentication process Fixed custom pronunciation in…

Understand SPE workers configuration

…the maximum number of simultaneously running tasks. # Multithread settings server.n_workers = 8 server.n_realtime_workers = 8 Requests for additional file processing tasks are put in a queue and processed according their order and priorities. Requests for additional stream processing tasks are refused with HTTP status 403 (the realtime nature of stream processing does not allow any queuing). File processing can…

Understand SPE benchmark

The SPE benchmark feature is great tool for quick and simple evaluation of processing speed directly on your hardware and using your audio files – simply call the …/benchmark endpoint corresponding to the technology you want to benchmark and wait for the result. The benchmark result summarizes the length of the processed speech, the processing time and the resulting Faster-than-Realtime…

Terms of Service

…order to adjust it to fit the technical requirements of processing the content. PHONEXIA will encrypt all Member content before transmitting or distributing for processing purposes. Member agrees to permit PHONEXIA to take these actions. 8. Service Subscription – this section is only applicable if Member agreed to the subscription payments for particular PHONEXIA Services. 8.1. Subscription Payments. Once the…

Understand SPE technologies, instances and workers

…for. Staffing the post office should be then done accordingly – ideally, there should be enough workers to allow having all counter desks open all the time. File processing workers cannot process realtime streams, and vice versa. Configuration of Speech Engine workers should be then done accordingly – ideally, there should be enough workers of each type to allow processing…

Release Notes

…and fixes Speech Engine: General Reduced RAM consumption (since 3.58.0) RAM consumption can be up to several gigabytes lower, depending on technologies configuration and processed audio. This is mainly visible in Speech To Text when processing many audios or longer audios (or both). The effect may be less visible in other technologies. Fixed issues with non-ASCII / Unicode file names…

STT: Results explained

…with newly received ones. Hint: These corrections never go back beyond end-of-segment boundary (</segment> token). In other words, they may happen only within a single segment boundaries. Realtime stream processing ouput Historically, realtime stream processing provided only single output type – one-best. The one-best results are updated continuosly, i.e. as soon as a new speech element is recognized, it’s immediately…

SPE and Browser installation: standalone SPE

In this post, we break down the complexities of the initial installation process of Phonexia Speech Engine (SPE), as a standalone installation. This means the SPE has to be started separately from the Phonexia Browser GUI (unlike in the embedded SPE mode, where Browser starts SPE as its background subprocess). By the end of the guide, you will be able…

Phonexia Speech Engine

…SPE manages its own queue of incoming REST requests and serves them according to available capacity of current installation. This means that the application layer can request any number of queries and then just wait untill they are processed. Processing priority management To allow off-queue high-priority or low-priority processing, SPE also allows to set priority for individual REST requests. Basic…

Speech to Text (STT)

…including discriminative training and neural network-based features Output One-best transcription – i.e. a file with a time-aligned speech transcript (time of word’s start and end) Variants for transcriptions – i.e. hypotheses for words at each moment (confusion network) or hypotheses for utterances at each slot (n-best transcription) Processing speed – several versions available: from 8x faster than real-time processing on…