LID adaptation

Relevance: 100%      Posted on: 2021-03-02

This article describes various ways of Language Identification adaptation. Basic terminology Languageprint (*.lp file) – numeric representation of the audio, extracted from audio file for language identification purpose of (similar to “voiceprint”, but representing the spoken language, not the speaking person) Languageprint archive (*.lpa file) – multiple languageprints combined into single archive Creation of languageprint archives is not supported by SPE, these are supported as input only.   Language model – digital characteristics of a specific language Language model can be trained from languageprints (*.lp), language prints archives (*.lpa), or from combination of both. LID language model should not be…

Language Identification (LID)

Relevance: 94%      Posted on: 2021-02-25

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Application areas Preselecting multilingual sources and routing audio files to language-dependent technologies (transcribing, indexing, etc.) Analyzing network traffic media (language statistics) Routing particular calls (languages) to human operators (language experts) Recognized languages Languages pre-trained in the default language pack are listed in the table below, each LID generation is a separate column (in the 4th generation we switched to using language

SPE configuration file explained

Relevance: 82%      Posted on: 2021-05-03

In this article we explain details of the Speech Engine configuration file, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, a default configuration is created in the settings directory. The file is loaded during SPE startup, i.e. you need to restart SPE to apply any changes made in the file. If Speech Engine is used together with Phonexia Browser in so-called "embedded" mode (see details about "embedded SPE" mode in Browser…

Understanding SPE directory structure

Relevance: 78%      Posted on: 2021-05-15

Good understanding of SPE directory structure helps to better understand the inner workings of SPE and simplifies troubleshooting. It's also useful for expert-level tuning of parameters of individual technologies and optimizing SPE configuration e.g. for deployments with shared resources, or deployments in virtualized environments, etc. The SPE directory structure looks like this (the tree depth is limited for better readability): {SPE_installation_directory} ├── bsapi │ ├── age │ │ ├── data │ │ ├── example . . └── settings . . . . │ └── vad │ ├── data │ ├── example │ └── settings ├── data │ ├── benchmark │…

Understanding SPE database

Relevance: 68%      Posted on: 2021-06-05

SPE database serves multiple purposes: stores SPE internal data stores various information about SPE entities created by SPE user audio files metadata speaker models and their voiceprints speaker groups and their voiceprints calibration sets keyword lists language packs audio source profiles stores cached processing results (optional, can be set in SPE configuration file) stores SPE log data (optional and MySQL only, can be set in SPE configuration file) To cache or not to cache? Well, that's a question... ;-) It depends on the particular use case AND on the design of your application, whether using the built-in results caching would be…

Speaker Identification (SID)

Relevance: 40%      Posted on: 2019-06-13

Phonexia Speaker Identification uses the power of voice biometry to recognize speakers by their voice... i.e. to decide whether the voice in two recordings belongs to the same person or two different people. Our goal as a regular participant of the NIST Speaker Recognition Evaluations (SRE) series is to contribute to the direction of research efforts and the calibration of technical capabilities of text-independent speaker recognition. The objective is to drive the technology forward and through the competing find the most promising algorithmic approaches for our future production-grade technology. Basic use cases and application areas The technology can be used…

STT Language Model Customization tutorial

Relevance: 39%      Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform…

Speaker Identification: Results Enhancement

Relevance: 38%      Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…

SPE3 – Quick Start Guide

Relevance: 31%      Posted on: 2018-04-16

Do you want to run the SPE3 for the first time? This post can help you. Distribution, installation and configuration SPE is distributed by Phonexia in .zip archives. These are downloaded from Phonexia package manager using link provided by Phonexia employee. Installation is done by simple unzipping the content of the downloaded .zip archive to SPE installation folder. Configuration of SPE is done at two places. First is executable file ./phxadmin or .\phxadmin.exe serving to set file to configuration and license files configure speech technologies configure user accounts set up of few various setting Running the ./phxadmin or .\phxadmin.exe command…