Search Results for: config%

Results 1 - 6 of 6 Page 1 of 1
Results per-page: 10 | 20 | 50 | 100

How to configure STT realtime stream word detection parameters

Relevance: 100%      Posted on: 2020-03-28

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part of the signal going to the decoder. Decoder is a component, which determines what a particular part of the signal contains (speech, silence, etc.). Based on that, decoder also decides whether segment has finished or not. Unlike in file processing…

STT Language Model Customization tutorial

Relevance: 100%      Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform…

LID adaptation

Relevance: 60%      Posted on: 2021-03-02

This article describes various ways of Language Identification adaptation. Basic terminology Languageprint (*.lp file) – numeric representation of the audio, extracted from audio file for language identification purpose of (similar to “voiceprint”, but representing the spoken language, not the speaking person) Languageprint archive (*.lpa file) – multiple languageprints combined into single archive Creation of languageprint archives is not supported by SPE, these are supported as input only.   Language model – digital characteristics of a specific language Language model can be trained from languageprints (*.lp), language prints archives (*.lpa), or from combination of both. LID language model should not be…

What is a user configuration file and how to use it

Relevance: 40%      Posted on: 2020-03-28

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it's difficult to restore working state. User configuration files provide a way to override processing parameters without modifying original BSAPI configuration files. WARNING: Inappropriate configuration changes may cause serious issues! Make sure you really know what you are doing. User configuration file is a…

Speech Engine 3.35.0

Relevance: 20%      Posted on: 2020-10-01

Speech Engine 3.35.0, DB v1600, BSAPI 3.35.0 (2020-10-01) New LID model L4 was promoted to production (LID BETA_L4 renamed to LID L4) Added new language tag documentation (doc/Technology_LID_L4_Language_tags.pdf) Updated STT model CS_CZ_5 to version 5.2.1 (fixes faulty transcription of numbers into Roman format) Added configurable STT Confusion Network threshold (in technology configuration file) Fixed STT didn't work with 4th and older generation models after introduction of the Preferred phrases feature in SPE 3.32 Update from SPE 3.30 causes errors in STT result cache memory leak in logging system Typo in name of es-XA language in LID model L4 default language…

SPE3 – Releases and Changelogs

Relevance: 20%      Posted on: 2021-02-26

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). Releases Changelogs Speech Engine 3.38.0, DB v1700, BSAPI 3.38.0 (2021-02-25) Non-public Feature Preview release New: Training of LID Language Packs (no more need for command line tools... finally!) New: LID Language Packs allow to store meta-files New: New entity "LID Language Model" (equivalent of *.lpa LanguagePrint Archive) Improved: Updated STT model RU_RU_A to version 4.6.0 of (updated language model) Removed: Support for RLS-enforced licences in command line applications Removed: FeaturePasterRepeat warning…