Search: real-time

29 results

STT: Results explained

…}, { “time_slot” : 19, “start_time” : 16850000, “end_time” : 17550000, “word” : “at”, “posterior_probability” : 0.000000000000003186612474256685, “channel” : }, { “time_slot” : 19, “start_time” : 16850000, “end_time” : 17550000, “word” : “to”, “posterior_probability” : 6.2535191806661645e-24, “channel” : }, { “time_slot” : 20, “start_time” : 17550000, “end_time” : 17850000, “word” : “time“, “posterior_probability” : 0.9984032848949292, “channel” : }, { “time_slot“…

Understand SPE workers configuration

…no one can really speak faster than realtime 😉 – so a single physical CPU core can actually process multiple realtime tasks simultaneously, depending on how much faster than realtime a particular technology is (and also how much speech the audio contains). This means that for stream processing technologies it makes sense to configure higher number of workers than physical…

Understand SPE configuration file

…that ‘stream.rtp.timeout‘ is deprecated since SPE 3.23.x input_stream.rtp.timeout = 10.0 Sets the timeout limit for RTP stream incoming data – when no audio data come from the stream for the defined time, the stream is automatically closed. Default value is 10 seconds. output_stream.rtp.timeout # Set timeout for output stream RTP socket in seconds. # If output stream doesn’t send any…

Releases and Changelogs (SPE)

…deadlock in MySQL database when moving files to calibration set [#4946] Fixed time ranges doesn’t properly work for multichannel recordings and for FLAC and OPUS [#4946] Fixed parameter “from_time” may cause corruption of processing data [#4950] Fixed STT may produce incorrect time stamps in confusion network result for multichannel recordings [#4985] Fixed Removing recording from Speaker model does not invalidate…

Release Notes

…PESQ is a standard way of expressing speech quality as perceived by human beings. SQE: Real-time processing A new technology model SQE_STREAM was added for real-time quality estimation on streams. Added Speaker Clustering endpoint for SID4 (SURPRISE of this release) Allows to compare a set of voiceprints and receive clusters of those. It will bring another level of effectiveness in…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

Faster than Real Time (FtRT) metric is developed for defining software performance reference point. Using this metric you can collect “benchmark” data of real processing speed for reviewed software, which should be found – and reproduced – on exactly defined HW. Then, comparing various benchmarks result, you can compare performance of the specified software and its parts on different HW…

SID4 performance on Intel® Xeon® Platinum 8124M

…times for each number of used cores (physical and virtual) Collected data are saved in CSV file FTRT numbers are calculated as median from collected measurements. Total system performance is simple multiplication of computed FTRT equivalent. Measuring of a software processing speed – what is the FtRT (Faster than Real Time) Understanding of the methodology: At the beginning, our…

Speech to Text (STT)

…including discriminative training and neural network-based features Output One-best transcription – i.e. a file with a time-aligned speech transcript (time of word’s start and end) Variants for transcriptions – i.e. hypotheses for words at each moment (confusion network) or hypotheses for utterances at each slot (n-best transcription) Processing speed – several versions available: from 8x faster than real-time processing on…

Understand SPE configuration

…timeout for HTTP stream in seconds. # If stream doesn’t receive any data for given time, then stream is closed. stream.http.timeout = 30 # Enable RTP stream subsystem stream.rtp.enable = true # IP address for create rtp sessions stream.rtp.bind_ip = 0.0.0.0 # Sets starting port for creating RTP sessions stream.rtp.min_port = 10000 stream.rtp.max_port = 11000 # Number of max opened…

Understand SPE processing queue

…can be handled simultaneously is defined by server.n_workers for audio files processing and server.n_realtime_workers for realtime streams processing settings in SPE configuration file. This is by default set automatically, based on your hardware and software configuration – see How to configure Speech Engine workers article. The picture below demonstrates the queue processing (for the sake of simplicity, technologies assignments to…

Understand SPE benchmark

The SPE benchmark feature is great tool for quick and simple evaluation of processing speed directly on your hardware and using your audio files – simply call the …/benchmark endpoint corresponding to the technology you want to benchmark and wait for the result. The benchmark result summarizes the length of the processed speech, the processing time and the resulting Faster-than-Realtime…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

FAQs (PSP)

…initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server. You can fix this by doing the following: Increase timeout in Settings > Speech Engine tab > First connection timeout Use fewer instances of technologies, thus letting the Speech Engine to start faster Use smaller models of technologies in FAQ Phonexia…

Understand SPE database

…configuration (users, roles, etc.). SQLite database updates are also handled automatically by SPE – from time to time, as we add new features or improve existing functionality, the database internal structure may get updated in newer SPE versions. When using SQLite, if new SPE version detects that database needs an update, it’s done fully automatically behind the scenes. [1] If…

Speaker Identification (SID)

…technological model and can range from 5 to 50 times faster than real time on 1 server CPU core. Voiceprint extraction is the most time-consuming part of the process. Voiceprint comparison, on the other hand, is extremely fast – a millions of voiceprint comparisons can be done in 1 second. Voiceprint extraction (Speaker enrollment) Speaker enrollment starts with the extraction…