Search: Configuración del servidor

64 results

Releases and Changelogs (SPE)

…BSAPI 3.35.3 (2020-11-24) New: Internal support for SAMPA phonetic alphabet New: Updated STT model RU_RU_A to version 4.5.0 of (updated language model) New: Updated STT/KWS/PHNREC model AR_XL to version 5.2.0 (updated language model, changed phonemes notation to X-SAMPA) Fixed: Cannot create new output stream due to hanging unfinished tasks Fixed: Task is not removed from pool when result is delivered…

Release Notes

…XL4 model (compatibility must be explicitly enabled) Speech Engine: Speech to Text (STT) We have several exciting new features relevant to STT and KWS technologies: Czech (Czech Republic) language model updated (tech. model name: CS_CZ_6): We added new words to the language model, so recent frequent words like “COVID” are correctly transcribed. Slovak (Slovakia) language model updated (tech. model name:…

STT: Results explained

…For example, when saying the word “happiness”, the first result may contain a word “happy”, and next result the word “happiness” (with the “start” and “end” times correspondingly changed… and the “score” and “confidence” values too). Such corrections are indicated by a delete_n_words value in the results. This value means how many previously received words should be deleted and replaced…

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model…

Understand SPE database

…voiceprints – voiceprint data, technology model used to create the voiceprint, speaker model to which the voiceprint belongs (speaker model voiceprints), calibration set to which the voiceprint belongs (FAR calibration set voiceprints) rest_model_sid_calib_voiceprint SID speaker model voiceprints calibrated to FAR – voiceprint data, speaker model, technology model used to create the voiceprint, max. FAR, calibration set used to calibrate the…

LID: Terminology and adaptation

…languageprints created using model L4 can be combined into languageprint archive and/or language model only with languageprints created using model L4… and language pack for model L4 must consist only from language models created using languageprints/archives of model L4. Adaptation types overview Creating new language model from your own audio files, to add new language not supported out-of-the-box at least…

Speech to Text (STT)

…data are similar to desired usage of resulting technology model, which is usually spontaneous speech. However as it is complicated to obtain such amount of data of this type, also other sources are used. Adaptation The technology can be adapted in two levels – in the Acoustic Model or the Language Model. Adapting the Acoustic Model to speakers from a…

SPE and Browser installation: standalone SPE

…nr. 23) 1) Age Estimation [active model: XL5(1x)] 2) Denoiser Technology [active model: EN_US(1x)] 3) Diarization [active model: XL4(1x)] 4) Gender Identification [active model: XL5(1x)] 5) Keyword Spotting [active model: EN_US_6(1x)] 6) Phoneme Recognition [active model: EN_US_6(1x)] 7) Keyword Spotting Stream [active model: EN_US_6(1x)] 8) Language Identification LanguagePrint Comparator [active model: L4(1x)] 9) Language Identification LanguagePrint Extractor [active model: L4(1x)]…

STT: What is Preferred Phrases feature and how to use it

…a decoder. The decoder uses the information from acoustic model, combines it with information from language model recognition network (which describes the statistics about word grouping and sentences of a given language) and provides the transcription output. (See the Speech To Text article for more details about speech transcription principles) When using preferred phrases, we build additional language model…

Adding new language or technology model (Browser)

…our example, we are adding new Spanish model (ES_6 technology model) of Speech to Text and Keyword Spotting (with Phoneme Recognizer). When you install new languages or models, they are turned off by default and need to be enabled in Phonexia Browser. To turn new models on, open Phonexia Browser: go to Settings Switch to Speech Engine tab Open STT…

Understand SPE executable files

…that all the technologies/models are available in that SPE installation, this command adds(*) the following to the technologies configuration file: SIDE_STREAM for both L3 and XL3 model, 3 instances of each SIDC_STREAM for both L3 and XL3 model, 3 instances of each SID4E_STREAM for both L4 and XL4 model, 1 instance of each SID4C_STREAM for both L4 and XL4 model,…

DELETE – Software Vetting (Best Practice)

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

Download Speech Platform

…XL5 Diarization (DIAR) – model XL4 Language Identification (LID) – model L4 Gender Identification (GID) – model XL5 Age Estimation (AGE) ) – model XL5 Voice Activity Detection (VAD) – model GENERIC_3 and SID4_XL5 Speech Quality Estimation (SQE) Time Analysis Extraction (TAE) Waveform Denoiser (DENOISER) Phonexia Browser example audio (in ./BROWSER/example/ and ./SPE/bsapi/{technology}/example/) Step #2 – First start To get…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

Releases and Changelogs (VIN)

…the Phonexia sales representative. Phonexia Voice Inspector 5.0 brings a Speaker Identification model XL5, that provides more accurate results for telephony data in comparison with previous generations of Speaker Identification models such as SID4 XL4. Users can observe that the SID4 XL5 model returns different values of LLR scores which are used for evidence calculation. Therefore Speaker Identification score distribution…