Skip to content Skip to main navigation Skip to footer

Search: speaker

43 results

Key Features (PSP)

…The Speech Platform includes the following technologies. Technologies are available in the Speech Engine component based on its particular configuration (Voice Biometrics, Transcription System, etc.) Speaker Identification (SID) – recognizes a speaker automatically based on their voice, Speaker Diarization (DIAR) – separates multiple speakers in mono audio automatically, Language Identification (LID) – detects the language or dialect spoken in a…

Q: What are the requirements for SID evaluation dataset?

For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset. SID dataset (minimum requirements): To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully. The requirements are: 50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*) 1+ minute of net speech…

Speech to Text (STT)

…As an example for English the following acoustic models can be trained: US English – to be used with US speakers British English – to be used with UK speakers Language models Language model consists of a list of words. This is limitation for a technology, as only the words from this list can go to the transcription. Together with…

FAQs (PSP)

…performance precisely, it’s important to prepare evaluation recordings set very carefully. The requirements are: 50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*) 1+ minute of net speech in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as…

Understand SPE metafiles

…for storing SID speaker model metadata – textual properties like name, date of birth, etc. are stored as JSON file (note: the structure and meaning is defined and understood by Browser itself, not by SPE), speaker photo and any other files attached to the speaker model are stored as separate files. Another example would be the information about content of…

FAQs (Browser)

…FAQ Voice Inspector Permalink Q: What are the requirements for SID evaluation dataset? For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset. SID dataset (minimum requirements): To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully. The requirements are: 50+ known speakers, 200+ recordings in…

Understand SPE configuration

…0022 Data storage and multithread settings The home directory of SPE contains all user data including audio recordings and metadata files from speech processing (speaker models, description etc.). This is another good example of using environment variables if your topology design requires multiple instances of SPE processing the same payload. This is great for sharing raw data between multiple physical…

Arabic dialects in Phonexia LID and STT

…difficulty in collecting spontaneous speech in dialect It might be tricky to create annotations for STT training – the dialect speakers write words down as they hear them, but given the missing standard for writing, different speakers can write words in different ways… i.e. annotations in dialect need to be double-checked and unified TEXT (used for STT language model training)…

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform. SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies. SPE capabilities overview: Audio files and stream processing Audio files RTP / HTTP streams Speaker Identification (SID) ✓ ✓ Speech To Text (STT) ✓ ✓ Keyword Spotting (KWS) ✓…

STT: Results explained

…a speaker does not pronounce a word correctly and the one-best results do not correspond to what the speaker actually said. Start– and end time is in HTK units. 1 HTK unit equals to 100 nanoseconds ==> dividing the values by 10000 gives the time in milliseconds. Score is a rate of match with the acoustic and language model from…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

Understand SPE directory structure

…data directory holds additional data files for entities created by that user – e.g. SID Speaker Models, or LID language packs. If there no such entities exist for that user, this directory is empty. Here is an example of admin‘s data directory containing custom LID language pack for model L4 and SID speaker models named “David” and “Paul” (the tree…

Download Voice Inspector 5.2

…models VIN application (graphical user interface, GUI) with the following technologies in-build Speaker Identification (SID4_XL5) Speaker Diarization (DIAR) Voice Activity Detection (VAD) Speech Quality Estimator (SQE) Phoneme Recogniser (PHNREC) example population sets and audio (in ./examples/) and example report templates (in ./templates/) Hardware requirements minimum – CPU: Intel® Core™ i5, RAM: 4 GB, Required HDD space: 0.5 GB for software…

Input audio quality

…of speech technologies (precision of speaker identification, transcription accuracy, etc.). Therefore it is essential to have as clean audio as possible. ? DO’S ? DON’TS Capture the sound as close to the source as possible, i.e. as close to the speaker’s mouth as possible as close to the recording source as possible to minimize the amount of ambient sounds and…