Skip to contentSkip to main navigation Skip to footer

Search: speaker

68 results

What is a user configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

Speech To Text results explained

…confidence. N-best output N-best output provides multiple alternatives for entire sentences or longer sequences of words. Analytical applications can use this format to take the alternatives and work with them further. It can be also useful when a speaker does not pronounce a word correctly and the one-best results do not correspond to what the speaker actually said. Each element…

FAQs (Voice Verify)

…no mechanism to detect which channel in stereo or multi-channel recording contains the voice of the desired speaker. The admin of Voice Verify must ensure that recordings used for voiceprint creation are mono and contain the voice of the desired speaker only. in FAQ Voice Verify Permalink Q: What are the audio/stream quality requirements? A: Please note that audio recordings…

Multi-server deployment

…after 24 hours due to storage capacity reasons. Monitoring Voice Verify contains an advanced monitoring tool Grafana accessible https://grafana.mydomain.com (scalable). Login credentials can be obtained from Phonexia’s Pre-Sale/Consulting teams as the monitoring tool requires a deeper understanding of the whole Voice Verify architecture. Calibration As the calibration process requires in-depth knowledge of Speaker Identification technology, Phonexia takes care of it…

Phonexia technologies introduction

…and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender Identification (GID) Speech Analytics technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis…

Installation Guide (PSP)

…link: https://partner.phonexia.com/kb/sp/speech-platform/evaluation-package/ Package content – Installation package contains following components, each in separate directory. Speech Engine (SPE) is the core of Phonexia speech platform. It is backend application that performs all the work. It processes files and returns desired result from each speech technology (age estimation, transcript, speaker identification and others). Communication with SPE is handled exclusively via Rest API….

Understanding SPE home directory

…Data The data directory holds additional data files for entities created by that user – e.g. SID Speaker Models, or LID language packs. If no such entities exist for that user, this directory is empty. Unlike the storage, content of this directory is intended to be manipulated by SPE only and should not be manipulated directly on the filesystem level….

Software Vetting


This part requires higher (and non-anonymous) access level.
How to solve this situation:

  1. Log in here if you are not logged in.
  2. Register here. It takes just a few clicks and it’s free.

Gender Identification (GID)

Gender Identification is a language-, domain- and channel-independent technology that uses the acoustic characteristics of the recording to determine the gender of the speaker in question. This technology is able to distinguish between two genders: Male (M) and Female (F). Minimum of speech signal for identification: 7+ sec recommended (with XL4 and L4 model (9+ sec for previous generation of…

LID adaptation

…requirements below Creating custom language pack consisting of your chosen set of languages, both pre-trained or created from your audio files Audio recordings requirements Format: WAV, FLAC, RAW with linear coding 16bit/8bit, sampling rate 8kHz+ Wide variety of speakers (50+) of various age and gender is required, to ensure rich variety of “language sounds” Only single language in the dataset…

FAQs (PSP)

…using it in the Phonexia Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: Difference between on-the-fly and off-line type of transcription (STT) Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to…

Single-server deployment

…of the whole Voice Verify architecture. Calibration As the calibration process requires in-depth knowledge of Speaker Identification technology, Phonexia takes care of it for its Clients. Please note, that for this step Phonexia needs purpose-bound and limited access to Client data. Calibration is part of the Proof of Concept or Set Up phase and belongs under Professional Services. The Client…