Skip to content Skip to main navigation Skip to footer

Search: Language Model

42 results

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model

Releases and Changelogs (SPE)

…(2022-05-10) Fixed: Processing slowdown and high CPU usage on Windows platform – the following technologies/models used multiple threads when they should not: STT/KWS – all 6th generation models LID – L4 model DIAR, GID, SID4 – XL4 model SQE – GENERIC model VAD – GENERIC3 model Speech Engine 3.50.3, DB v1901, BSAPI 3.50.3 (2022-04-28) New: Models for STT and KWS…

LID: Terminology and adaptation

languageprints created using model L4 can be combined into languageprint archive and/or language model only with languageprints created using model L4… and language pack for model L4 must consist only from language models created using languageprints/archives of model L4. Adaptation types overview Creating new language model from your own audio files, to add new language not supported out-of-the-box at least…

Release Notes

…XL4 model (compatibility must be explicitly enabled) Speech Engine: Speech to Text (STT) We have several exciting new features relevant to STT and KWS technologies: Czech (Czech Republic) language model updated (tech. model name: CS_CZ_6): We added new words to the language model, so recent frequent words like “COVID” are correctly transcribed. Slovak (Slovakia) language model updated (tech. model name:…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

Language Identification (LID)

…Routing particular calls (languages) to human operators (language experts) Scoring and results The LID language pack defines a set of recognizable languages (represented by a language models). When identifying the language in audio recording (or languageprint), LID does the following: creates languageprint of the recording (if the input is audio recording) compares that languageprint with each language model in a…

Understand SPE database

…user), technology model used to create the profile, file with the profile content, hash rest_profile_sid4_metafiles list of files used as SID4 Audio Source Profiles metafiles rest_model_lid list of LID language packs – name, owner (SPE user), technology model to which the language pack belongs (i.e. technology model used to create source languageprints/language models) rest_model_lid_metafiles list of LID language packs metafiles…

Speech to Text (STT)

…1 CPU core (eg. standard 8 CPU core server (8 instances of STT) can process 1010 hours of audio in 1 day of computing time (flat load, depend on technology model)) Supported languages: List of supported languages. Acoustic models Acoustic model is created by training on training data. It includes characteristics of a voices of a set of speakers provided…

SPE and Browser installation: standalone SPE

…nr. 23) 1) Age Estimation [active model: XL5(1x)] 2) Denoiser Technology [active model: EN_US(1x)] 3) Diarization [active model: XL4(1x)] 4) Gender Identification [active model: XL5(1x)] 5) Keyword Spotting [active model: EN_US_6(1x)] 6) Phoneme Recognition [active model: EN_US_6(1x)] 7) Keyword Spotting Stream [active model: EN_US_6(1x)] 8) Language Identification LanguagePrint Comparator [active model: L4(1x)] 9) Language Identification LanguagePrint Extractor [active model: L4(1x)]…

STT: What is Preferred Phrases feature and how to use it

…a decoder. The decoder uses the information from acoustic model, combines it with information from language model recognition network (which describes the statistics about word grouping and sentences of a given language) and provides the transcription output. (See the Speech To Text article for more details about speech transcription principles)   When using preferred phrases, we build additional language model

Adding new language or technology model (Browser)

…our example, we are adding new Spanish model (ES_6 technology model) of Speech to Text and Keyword Spotting (with Phoneme Recognizer). When you install new languages or models, they are turned off by default and need to be enabled in Phonexia Browser. To turn new models on, open Phonexia Browser: go to Settings Switch to Speech Engine tab Open STT…

Releases and Changelogs (Browser)

…results to clipboard [#4975] Keyword spotting button in toolbar is not disabled anymore in case of invalid keyword-list is selected [#4976] Waveform editor now distinguish KWS/Diar technology models (it is possible to open results for more models at once) [#4979] SID models status indication [#4979] User can prepare SID model/group by context menu [#4980] Show speech length for speaker models

FAQs (PSP)

…may use them for both training a new language pack and testing/comparing against an existing language pack. The language-prints need to be compatible only with the model of LID used for language-print extraction. in FAQ Speech Platform Permalink Q: What are the recommendations for LID adaptation set? A: The following is recommended: For adding new language to language pack 20+…

Arabic dialects in Phonexia LID and STT

…for each – North Levantine (apc) and South Levantine (ajp). Our models were trained using data from both varieties, therefore we followed RFC 5646, section 2.2.4 and created custom language code ar-XL, where the XL means “cross-Levantine” 😉 NOTE: To get the best STT results, use the model that corresponds to given dialect. The AR_XL_* model is best suitable for…

Understand SPE directory structure

…for individual models settings BSAPI configuration files (*.bs) and optionally manually created user configs (*.bs.usr) There is one exception – LID – which has additional two directories containing pre-built languageprint archives (*.lpa) and language packs: lprints and models. Schemes below show examples of directories for GID (Gender Identification), STT (Speech To Text) and LID (Language Identification): – GID and LID…