Search: models

43 results

Installation of Phonexia Browser

Some packages are distributed with only a limited set of speech technologies and languages or without speech technologies. First installation Our software is distributed as a ZIP file. Installation procedure is as simple as: unzip the archive paste additional KWS, STT… models paste the license.dat file to the root directory where you have BROWSER folder and run_browser(.exe) script run the…

Understand SPE administration and backup

…where the temporary results are stored, see Understand SPE database for details Backup System backup should be performed before any update or upgrade of the SPE. It is strongly recommended to back up mainly the following components of the system: SPE database – the technology models, SPE user accounts, etc. are stored here SPE configuration file (usually /settings/phxspe.properties) technologies configuration…

Age Estimation (AGE)

…coding), A-law or Mu-law, PCM, 8kHz+ sampling Voiceprints: AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself Output Log file with processed information (age estimate) Processing speed Approx. 20x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 3,840 hours of audio in 1 day of computing…

STT: Language Model Customization tutorial

…names, etc. Note: LMC works only with 5th or newer generation STT models. LMC is available as part of phxcmd command line tool (version 3.55 or newer), in older versions as part of Speech To Text package for command line (or as a separate download). Customizing STT language model 1) Creating word list Word list is UTF-8 encoded text file,…

STT: Results explained

…milliseconds. Score is logarithm of probability from {-inf,0} interval – the higher score, the higher probability that the word was spoken in that time interval. Confidence is a probability from {0,1} interval. It’s calculated from the score value using e score formula. Multiplying the value by 100 gives the confidence percentage. NOTE: Some ancient legacy models do not support confidence….

KWS: Results explained

…before the keyword (1), the Keyword model (2) and a Background model of any speech parallel with the keyword model (3). Models 2 and 3 produce two likelihoods – Lkw and Lbg (any speech = background). Raw score is calculated as log likelihood ratio (LLR): score = loge(Lkw/Lbg) Confidence is calculated from the raw score using a sigmoid function: where:…

FAQs (PSP)

…FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What do LLR, LR and score mean? A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf). LLR…

FAQs (Browser)

…score sharpness value to calibrate the recalculation. Please see Calibration in technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What do LLR, LR and score mean? A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data…

Download Voice Inspector 5.2

…models VIN application (graphical user interface, GUI) with the following technologies in-build Speaker Identification (SID4_XL5) Speaker Diarization (DIAR) Voice Activity Detection (VAD) Speech Quality Estimator (SQE) Phoneme Recogniser (PHNREC) example population sets and audio (in ./examples/) and example report templates (in ./templates/) Hardware requirements minimum – CPU: Intel® Core™ i5, RAM: 4 GB, Required HDD space: 0.5 GB for software…

Understand SPE home directory

…Data The data directory holds additional data files for entities created by that user – e.g. SID Speaker Models, or LID language packs. If no such entities exist for that user, this directory is empty. Unlike the storage, content of this directory is intended to be manipulated by SPE only and should not be manipulated directly on the filesystem level….

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

SPE and Browser installation: standalone SPE

…LID technology, simply increase the number 1 to for ex. 5: <name>DIAR</name> <models> <item> <name>XL4</name> <n_instances>1</n_instances> <config_file/> </item> </models> 5. Optional: configure the multimedia converter By default, the Speech Engine will accept only a limited list of audio formats. In order to process the non-native formats, install a multimedia converter. The recommended SW for this is FFmpeg. FFMPEG on Windows…

Language Identification – Languages

Information about release dates, support and maintenance periods of Phonexia Language Identification technology models – languages….