Skip to content Skip to main navigation Skip to footer

Search: Audio%20Source%20Profile

72 results

Phonexia End User License Agreement

…downloading any necessary materials or software, even if Phonexia has been advised of the possibility of such damages. 4.2 Phonexia recognizes and agrees that the Client remains the sole owner of the title to any data provided to Phonexia while using the Web demo license, including audio recordings, transcripts, personal information, or any intellectual property rights contained therein (the “Provided…

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio

STT: Results explained

…outputs The outputs can contain the following special tokens: Token (5th STT generation and newer) Token (legacy STT generations) Meaning <segment> <s> start of utterance </segment> </s> end of utterance <silence/> _SILENCE_ or <sil/> silent part (or no speech detected) <null/> _DELETE_ time slot should not go to one-best output Realtime stream processing output modes NOTE: Only single-channel (mono) audio

Understand SPE workers configuration

…no one can really speak faster than realtime 😉 – so a single physical CPU core can actually process multiple realtime tasks simultaneously, depending on how much faster than realtime a particular technology is (and also how much speech the audio contains). This means that for stream processing technologies it makes sense to configure higher number of workers than physical…

SID4 performance on Intel® Xeon® Platinum 8124M

…enforcement agencies might use different methods gathering recording, but the principle is very similar. Based on data measured on data set described above we can see this conclusion for Intel® Xeon® Platinum 8124M: Phonexia SID4 using L4 model can perform up to 180 FTRT using 1 physical CPU core when processing audio data containing 44% of speech Optimal system performance…

What is User configuration file and how to use it

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it’s difficult to restore…

Download Voice Inspector 5.2

…models VIN application (graphical user interface, GUI) with the following technologies in-build Speaker Identification (SID4_XL5) Speaker Diarization (DIAR) Voice Activity Detection (VAD) Speech Quality Estimator (SQE) Phoneme Recogniser (PHNREC) example population sets and audio (in ./examples/) and example report templates (in ./templates/) Hardware requirements minimum – CPU: Intel® Core™ i5, RAM: 4 GB, Required HDD space: 0.5 GB for software…

Documentation (SPE)

…files in [SPE]/doc in standard software package and installation. You can also find REST API reference (Speech Engine) documentation online. You might be interested in reading the following information in manual: REST API reference Structure of API queries Asynchronous request Task prioritization Authentication Audio requirements RTP/HTTP streams Error responses API Commands Usage examples API Requirements Installation guide And much more…

Understand SPE processing priority

SPE has a simple built-in system of task prioritization. This allows for flexible management of processing queue, which is useful especially in mass audio processing. For example, if there is a long queue of files waiting to be processed, and one needs to urgently process another bunch of files, these files can be sent for processing using higher priority… and…

Understand SPE processing queue

…can be handled simultaneously is defined by server.n_workers for audio files processing and server.n_realtime_workers for realtime streams processing settings in SPE configuration file. This is by default set automatically, based on your hardware and software configuration – see How to configure Speech Engine workers article. The picture below demonstrates the queue processing (for the sake of simplicity, technologies assignments to…

Understand SPE technologies configuration file

…to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc. To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin…

Time Analysis Extraction (TAE)

…dialogue. This can be used to improve calls between operators and callers or to indicate potential stress points in phone calls, for example, change of speech speed during the conversation). Input TAE can process both audio files and streams (for format details see Speech Engine documentation). By its nature, TAE is usable mainly on two channel phone calls recordings, where…