Skip to content Skip to main navigation Skip to footer

Search: n-best

40 results

KWS: Results explained

This article aims on giving more details about Keyword Spotting outputs and hints on how to tailor Keyword Spotting to suit best your needs. Scoring Keyword Spotting works by calculating likelihood ratios (LR) that at a given spot occurs a keyword or just any other speech, and comparing those two likelihood ratios. The following scheme shows Background model for anything…

Understand SPE configuration file

…processing audio files (server.n_workers) and realtime streams (server.n_realtime_workers). Starting from SPE 3.51, default value for both is -1, which tells SPE to set the number of workers automatically for best performance according to local conditions: number of file processing workers is set either equal to total number of configured file-processing technologies, or equal to number of physical CPU cores, whichever…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…computing performance is better by ~17% compared with Intel® Xeon® E5 2860 v4 FtRTaudio shows that real requirements for HW and its computing power are approx. 62% lower than traditional approach using FtRTnet_speech for audio dataset with similar ratio between speech and non-speech (silence) and it is proven by measuring it. Best practices Use FtRTaudio when calculating hardware sizing and…

SID: Speaker Identification: Results Enhancement

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system…

LID: Terminology and adaptation

This article describes various ways of Language Identification adaptation. Basic terminology Languageprint (*.lp file) – numeric representation of the audio, extracted from audio file for language identification purpose of (similar to “voiceprint”, but representing sound of the spoken language, not sound of the speaking person) Languageprint archive (*.lpa file) – multiple languageprints combined into single archive Languageprint archives come pre-created…

Understand SPE workers configuration

Worker is a working thread performing the actual files- or realtime streams processing in Speech Engine. This article helps to understand the Speech Engine workers and provides information how to configure workers for optimal performance and server utilization. Starting from SPE 3.51, new defaults in settings/phxspe.properties make SPE to configure workers automatically according to local conditions (physical CPU cores, configured…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

What is User configuration file and how to use it

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it’s difficult to restore…

Understand SPE technologies, instances and workers

Configuring Speech Engine to utilize effectively the full power of underlying hardware can get challenging – one can easily get lost in all the strange terms like technologies, instances, slots, or workers… This article should shed some light in it. Speech Engine is like post office Thinking about Speech Engine, there is actually a very nice analogy with post office…

STT: What is Preferred Phrases feature and how to use it

Preferred phrases is a feature, available for 5th or newer generation of STT models and Speech Engine 3.32 or later. This article explains what is the feature good for, how does it work internally and gives some tips for practical implementation. What are preferred phrases In the speech transcription tasks, there may be situations where similarly sounding words get confused,…

Understand SPE directory structure

Good understanding of SPE directory structure helps to better understand the inner workings of SPE and simplifies troubleshooting. It’s also useful for expert-level tuning of parameters of individual technologies and optimizing SPE configuration e.g. for deployments with shared resources, or deployments in virtualized environments, etc. The SPE directory structure looks like this (the tree depth is limited for better readability):…

Understand SPE database scripts

This article explains details and usage of SQL database scripts stored in SPE installation directory in /data/database subdirectory. These scripts are intended for setup and maintenance of SPE database for supported database types, currently SQLite and MariaDB (from SPE 3.46) / MySQL (up to SPE 3.45). Script types For each database type, there are two directories with two types of…

Understand SPE database

SPE database serves multiple purposes: stores SPE internal data stores various information about SPE entities created by SPE user audio files metadata speaker models and their voiceprints speaker groups and their voiceprints calibration sets keyword lists language packs audio source profiles stores cached processing results (ON by default, can be set in SPE configuration file) optionally also stores SPE log…

Phonexia Academy

…Technical Training Advanced – 2 courses: Voice Biometrics Course (in-person, 2 days) Speech Analytics Course (in-person, 2 days) In Technical Training Advanced courses, we share best practices, detailed use-cases analysis, and hands on. Both courses are adjusted to our partners’ requests considering their typical projects. You might be interested to read also the following: How to prepare for course?  …