Skip to content Skip to main navigation Skip to footer

Search: model L

58 results

Gender Identification (GID)

…generation of XL3 and L3 models) Output scoring: log-likelihood ratio (LLR) and score (0-1). Score can be interpreted as percentage by multiplying the score by 100. Typical use cases: filtering calls by gender, playing advertisement focused on specific gender, getting quick demographic analysis of the recordings. The speed of Gender Identification is up to 150 FtRT (depending on the model)….

Age Estimation (AGE)

…coding), A-law or Mu-law, PCM, 8kHz+ sampling Voiceprints: AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself Output Log file with processed information (age estimate) Processing speed Approx. 20x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 3,840 hours of audio in 1 day of computing…

STT: Results explained

…can go on in time <sil\/> can you hear me <sil\/> Okay <sil\/> I wanted to call you and give you an update on what’s going on.”, “channel” : 0, “score” : 19204.861, “confidence” : 0.20003852 }, { “phrase” : “I guess we can go on in time <sil\/> can you hear me <sil\/> Okay <sil\/> I want to call…

FAQs (Browser)

…are under one model than the other. LR meets numbers in interval <0;+inf). LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf). Percentage (normalised) score – commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too…

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

Q: What do LLR, LR and score mean?

A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf). LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval…

Q: What are the recommendations for LID adaptation set?

A: The following is recommended: For adding new language to language pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in…

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

Phonexia Speaker Identification is a voice biometry tool for recognition of speakers by their voice. In this video, we will show you how to start using this technology! You will learn how to create a “Speaker Model” to identify a speaker in a set of data. Ready to test it? Start with our video: What else is needed? 1. Phonexia…

Understand SPE database scripts

…MySQL command line client) use create_schema.sql script then use init_data.sql script when you need to clean your SPE DB (and don’t want to delete/re-create the entire DB for some reason) use drop.sql to completely erase the DB content, followed by re-creating the content using create_schema.sql and init_data.sql or use clean.sql to clean “rest_directory_type”, “rest_role”, “rest_user”, “rest_technology_model” and “rest_model_lid” tables Scripts…

Waveform Denoiser (DENOISER)

Phonexia Waveform Denoiser (DENOISER) ensures automatic dereverberation (removal of echoes caused by sound in the rooms) and automatic noise reduction of the speech signal. The data model is usually trained for various types of noise using the latest generation of algorithms based on neural networks. Automatically removed are mainly noises similar to those that was software trained on. Conversely, the…

Phoneme Recogniser (PHNREC)

…Input: „Hi, this it Lewis.“ (WAV file containing speech) Output: sil hh ay dh ow s ih s l uw uw th sil (plain-text or xml/json output) Note: The outputs can contain the following special tokens: sil silent part (or no speech detected) The list of phonemes is available in the document phonemes_for_stt_and_kws.pdf (delivered as part of manuals in SPE…

Understand SPE user accounts

…not visible by SPE and by the account. Similar trickery can be done with the data directory, allowing to share LID language models and language packs, or SID speaker models, etc. between accounts. User accounts management SPE user accounts can be managed using REST API (see Administration section of the API documentation), or using command line administration utilities phxadmin or…

Phonexia Speech Engine

…providers via simple plugin-like connectors interface Flexible integration SPE can provide results in JSON or XML format. Result can be obtained by polling, via websockets, or via webhooks (callbacks). Status information SPE can provide various status information to the application layer, e.g. license status, configuration info, current overall load, pending operations status, … Quick start The following tutorial describes the…