Skip to content Skip to main navigation Skip to footer

Search: speed

21 results

Terms of Service

…of this content. 4. Use of PHONEXIA Services 4.1. Equipment Requirements. PHONEXIA Members must provide all equipment required to use the PHONEXIA service including but not limited to a computer and a phone, as well as the respective services such as high speed Internet connection, and telephone service (land-line or cellular phone) through a third-party provider. PHONEXIA does not provide…

Releases and Changelogs (Browser)

…wizard can’t create a report if server doesn’t support Diarization [G#21] Unified SID terminology Phonexia Browser v3.10.1, BSAPI 3.14.0 – Dec 6 2017 [#5068] Speed up preparing of calibration set [#5036] Use own configuration file for local SPE – original configuration file of SPE is not changed anymore [#4542] Better error message when calibration set contains invalid recordings [#5195] Added…

Phonexia Speech Engine

…✓ Voice Activity Detection (VAD) ✓ ✓ Time Analysis Extraction (TAE) ✓ ✓ Speech Quality Estimation (SQE) ✓ ✓ Language Identification (LID) ✓ Gender Identification (GID) ✓ Age Estimation (AGE) ✓ Speaker Diarization (DIAR) ✓ Results caching Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology…

Understand SPE workers configuration

…process data faster than realtime, which allows them to utilize 100% of a physical CPU core. This means that for file processing technologies the number of workers should be set to a number of physical CPU cores in the server and there is no point configuring more workers. Stream processing can process data at real time speed at maximum –…

SID: Speaker Identification: Results Enhancement

…user (for more information, have a look at SID Evaluation). Audio Source Profile caching (SPE only) In order to maximize the speed of continuous Speaker Identification with calibration, we have implemented a caching mechanism with two separate caches (which themselves are separate from the basic SPE result cache). Audio Source Profiles cache Persistence: no (in-memory only) Cache Type: LRU cache…

Orbis 1.4.0 Release Notes

Newest generation of Speaker Identification technology added Speaker identification technology verifies and authenticates speakers in seconds. The new generation has increased accuracy by 1 percentage point (a relative improvement of 33 %) – XL5 model vs. XL4 model that was previously in Orbis. The processing speed of the XL5 model is the same or faster than that of the XL4…

Age Estimation (AGE)

…coding), A-law or Mu-law, PCM, 8kHz+ sampling Voiceprints: AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself Output Log file with processed information (age estimate) Processing speed Approx. 20x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 3,840 hours of audio in 1 day of computing…

Speaker Identification (SID)

…signal captured in a recording are also more or less unique, thus the technology can be language-, accent-, text-, and channel-independent. Automatic speaker recognition systems are based on the extraction of the unique features from voices and their comparison. The systems thus usually comprise two distinct steps: Voiceprint Extraction (Speaker enrollment) and Voiceprint comparison. The processing speed depends on the…

Speech Quality Estimation (SQE)

…of bits used by the waveform absolute value if less than 8, the signal has insufficient quality wfilter_technical_signal_length – the length of technical signals (tones, wide-band noise, etc.), measured in seconds Processing speed Approx. 2,000x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 384,000 hours of audio in 1 day of computing time…

Speech to Text (STT)

…including discriminative training and neural network-based features Output One-best transcription – i.e. a file with a time-aligned speech transcript (time of word’s start and end) Variants for transcriptions – i.e. hypotheses for words at each moment (confusion network) or hypotheses for utterances at each slot (n-best transcription) Processing speed – several versions available: from 8x faster than real-time processing on…

Release Notes

…Fixes Speech Engine: The 5th generation of Speaker Identification (SID) We are eager to introduce a new 5th generation model XL5 of our Speaker Identification. Its main highlights are: Increased accuracy by 1 p.p. (33 % lower EER) over XL4 model, especially on 16 kHz audio (VoLTE) Same or faster processing speed than XL4 model Optional backward voiceprint compatibility with…

SID4 performance on Intel® Xeon® Platinum 8124M

…times for each number of used cores (physical and virtual) Collected data are saved in CSV file FTRT numbers are calculated as median from collected measurements. Total system performance is simple multiplication of computed FTRT equivalent. Measuring of a software processing speed – what is the FtRT (Faster than Real Time)   Understanding of the methodology: At the beginning, our…

Gender Identification (GID)

…7+ sec recommended (with XL4 and L4 model (9+ sec for previous generation of XL3 and L3 models) Output scoring: likelihood ratio and percentage metric (0-100%) Typical use cases: filtering calls by gender, playing advertisement focused on specific gender, getting quick demographic analysis of the recordings. The speed of Gender Identification is up to 150 FtRT (depending on the model)….

Sizing of the computing units for speech technologies

…cores = 64 GB Conclusion: The best computing performance can be expected from a CPU with: l3_cache_size/#_of_physical_CPU_cores=>2.5 MB Memory bandwidth & speed is more important than CPU base frequency. Intel fixes on TLB due to Meltdown and Spectre issues matters in performance. Important notice (valid for SPE3) – due to internal SPE3 requirements you must multiple the required number of…

Voice Activity Detection (VAD)

…VAD is usually part of rapid filtration process in deployment. Typical use cases are: detection of present or absent human speech for voice processing, filtering non-speech parts of the recording, filtering out recordings with not enough net speech to be processed by other technologies voice activated process, etc. The speed of Voice Activity Detection is 140 ftRT per one instance….