Search: model%20L

58 results

Gender Identification (GID)

…generation of XL3 and L3 models) Output scoring: log-likelihood ratio (LLR) and score (0-1). Score can be interpreted as percentage by multiplying the score by 100. Typical use cases: filtering calls by gender, playing advertisement focused on specific gender, getting quick demographic analysis of the recordings. The speed of Gender Identification is up to 150 FtRT (depending on the model)….

Age Estimation (AGE)

…coding), A-law or Mu-law, PCM, 8kHz+ sampling Voiceprints: AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself Output Log file with processed information (age estimate) Processing speed Approx. 20x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 3,840 hours of audio in 1 day of computing…

STT: Results explained

…milliseconds. Score is logarithm of probability from {-inf,0} interval – the higher score, the higher probability that the word was spoken in that time interval. Confidence is a probability from {0,1} interval. It’s calculated from the score value using e score formula. Multiplying the value by 100 gives the confidence percentage. NOTE: Some ancient legacy models do not support confidence….

FAQs (Browser)

…score sharpness value to calibrate the recalculation. Please see Calibration in technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What do LLR, LR and score mean? A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data…

Understand SPE metafiles

…for storing SID speaker model metadata – textual properties like name, date of birth, etc. are stored as JSON file (note: the structure and meaning is defined and understood by Browser itself, not by SPE), speaker photo and any other files attached to the speaker model are stored as separate files. Another example would be the information about content of…

Q: What do LLR, LR and score mean?

A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf). LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval…

Q: What are the recommendations for LID adaptation set?

A: The following is recommended: For adding new language to language pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in…

Q: Can I add words into dictionary?

A: Yes, you can use Language Model Customization (LMC). For more details please read STT Language Model Customization tutorial….

Voice Inspector – Interpretation of results

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

Phonexia Speaker Identification is a voice biometry tool for recognition of speakers by their voice. In this video, we will show you how to start using this technology! You will learn how to create a “Speaker Model” to identify a speaker in a set of data. Ready to test it? Start with our video: What else is needed? 1. Phonexia…

Understand SPE database scripts

…MySQL command line client) use create_schema.sql script then use init_data.sql script when you need to clean your SPE DB (and don’t want to delete/re-create the entire DB for some reason) use drop.sql to completely erase the DB content, followed by re-creating the content using create_schema.sql and init_data.sql or use clean.sql to clean “rest_directory_type”, “rest_role”, “rest_user”, “rest_technology_model” and “rest_model_lid” tables Scripts…

Waveform Denoiser (DENOISER)

Phonexia Waveform Denoiser (DENOISER) ensures automatic dereverberation (removal of echoes caused by sound in the rooms) and automatic noise reduction of the speech signal. The data model is usually trained for various types of noise using the latest generation of algorithms based on neural networks. Automatically removed are mainly noises similar to those that was software trained on. Conversely, the…

Phoneme Recogniser (PHNREC)

…user can add to language model of speech-to-text technology (better accuracy of KWS technology). Input audio file (format details – see Speech Engine documentation); stream not supported, technology model name (i.e. language code) to be used for phoneme transcription. Output In the process of transcribing speech-to-phonemes, the Phoneme Recogniser usually identifies individual speech segments and convert it to pronunciation. Example…

Understand SPE user accounts

…not visible by SPE and by the account. Similar trickery can be done with the data directory, allowing to share LID language models and language packs, or SID speaker models, etc. between accounts. User accounts management SPE user accounts can be managed using REST API (see Administration section of the API documentation), or using command line administration utilities phxadmin or…

Phonexia Speech Engine

…main binary file itself SPE requires database, which might be SQLite (delivered inside Phonexia package) or MySQL. No other components are needed. Structure of Technologies and technology models From the technical point of view, every technology can work with different technology modules. These are various languages for STT (CS_CZ4, EN_US4), or various sizes for SID (L3, XL3). Technology can work…