Skip to content Skip to main navigation Skip to footer

Search: speaker

59 results

FAQs (Browser)

…FAQ Voice Inspector Permalink Q: What are the requirements for SID evaluation dataset? For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset. SID dataset (minimum requirements): To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully. The requirements are: 50+ known speakers, 200+ recordings in…

FAQs (PSP)

…performance precisely, it’s important to prepare evaluation recordings set very carefully. The requirements are: 50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*) 1+ minute of net speech in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as…

Download Speech Platform

…only English models for Speech To Text and Keyword Spotting. Additional supported languages are available upon request. ⓘ Click to show/hide the package content Speech Engine – technologies included: Speech To Text (STT) – model EN_US_6 (US English) Keyword Spotting (KWS) – model EN_US_6 (US English) Phoneme Recognizer (PHNREC) – model EN_US_6 (US English) Speaker Identification 4 (SID4) – model…

Multi-server deployment

…Grafana accessible https://grafana.mydomain.com (scalable). Login credentials can be obtained from Phonexia’s Pre-Sale/Consulting teams as the monitoring tool requires a deeper understanding of the whole Voice Verify architecture. Calibration As the calibration process requires in-depth knowledge of Speaker Identification technology, Phonexia takes care of it for its Clients. Please note, that for this step Phonexia needs purpose-bound and limited access to…

STT: What is Preferred Phrases feature and how to use it

…specifying longer phrases does not bring any benefit number of preferred phrases is not limited… but from practical perspective, using hundreds or thousands of phrases is questionable (such a huge number means “prefer basically anything what the speaker says”, which defeats its purpose) only 5th or newer STT generations support preferred phrases Question: So, what to put in the preferred…

Understand SPE user accounts

…not visible by SPE and by the account. Similar trickery can be done with the data directory, allowing to share LID language models and language packs, or SID speaker models, etc. between accounts. User accounts management SPE user accounts can be managed using REST API (see Administration section of the API documentation), or using command line administration utilities phxadmin or…

Age Estimation (AGE)

Phonexia Age Estimation (AGE) estimates the age of a speaker from audio recording or voiceprint. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Audio: WAV or RAW (8 or 16 bits linear…

Audio Quality Estimation

…expected streams, the initialization of the technology could take some additional time. We recommend waiting for 3-4 seconds before the technology is ready. To verify the status of the Audio Quality Estimation, please call: GET /api/v2/maintenance/technologies Endpoint will return this: { “message”: “List of supported technologies and their status.”, “status”: “OK”, “technologies”: { “audio_quality_estimation”: “enabled” “speaker_change_detection“: “disabled” } }  …

Video – Voice Biometrics technologies

MODULE 3: Voice Biometrics technologies (23 min) Common generic rules for CLI, REST and GUI Speaker Identification (SID) in CLI, REST and GUI Language Identification (LID) in CLI, REST and GUI Gender Identification (GID) in CLI, REST and GUI Summary https://www.youtube.com/watch?v=AyEoPfYVel8…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…in our example is 36 seconds. After stripping silence, it gets 14 seconds – this means that original audio contains 38% of net speech and 62% of silence. Phonexia speech technologies analyze the entire recording, but pick only the speech segments for AI processing, i.e. the absolute processing time will be practically the same… Creating voiceprint by Speaker Identification took:…

Phonexia technologies introduction

…and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender Identification (GID) Speech Analytics technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis…

LID: Terminology and adaptation

…with linear coding 16bit/8bit, sampling rate 8kHz+ Wide variety of speakers (50+) of various age and gender is required, to ensure rich variety of “language sounds” Only single language in the dataset NOTE: mixing in a different language negatively affects the resulting recognition accuracy Audio length: ideally between 1 and 5 minutes of speech signal NOTE: it is not possible…