Search: model%20L

58 results

Understand SPE configuration file

…value is ‘${application.dir}shared’ server.shared.path = ${application.dir}shared Path to a directory intended to hold (customized) technology models shared by all SPE users. Defaults to shared subdirectory of SPE application directory and exists only in SPE 3.41 or newer. For additional details about shared models directory, see Understanding SPE directory structure article. NOTE: If you change the server.shared.path, you might also want…

Download Speech Platform

…only English models for Speech To Text and Keyword Spotting. Additional supported languages are available upon request. ⓘ Click to show/hide the package content Speech Engine – technologies included: Speech To Text (STT) – model EN_US_6 (US English) Keyword Spotting (KWS) – model EN_US_6 (US English) Phoneme Recognizer (PHNREC) – model EN_US_6 (US English) Speaker Identification 4 (SID4) – model…

Releases and Changelogs (VIN)

…the Phonexia sales representative. Phonexia Voice Inspector 5.0 brings a Speaker Identification model XL5, that provides more accurate results for telephony data in comparison with previous generations of Speaker Identification models such as SID4 XL4. Users can observe that the SID4 XL5 model returns different values of LLR scores which are used for evidence calculation. Therefore Speaker Identification score distribution…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

Arabic dialects in Phonexia LID and STT

…for each – North Levantine (apc) and South Levantine (ajp). Our models were trained using data from both varieties, therefore we followed RFC 5646, section 2.2.4 and created custom language code ar-XL, where the XL means “cross-Levantine” 😉 NOTE: To get the best STT results, use the model that corresponds to given dialect. The AR_XL_* model is best suitable for…

FAQs (PSP)

…hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio) in FAQ…

Understand SPE benchmark

…SPE by replacing the content of default directory with your own audio files creating a directory with name according the name-matching rule (see above) and putting audio files in corresponding language in the directory For example: directory named es would be matched for es_6 and es_es_5 models, but not the old spanish_american model directory named cs_cz_fin would be matched only…

Speech Engine update

…technology models configuration usually introduces new features or major fixes, which may change communication between server and client, or other changes which may affect customer processes can also include new technology models; with such update you can add only the new technology, without SPE installation Upgrade changes the first version number (e.g. x.y.z to x+1) and is a major change…

Understand SPE multithreaded technologies initialization

…rather for stable environments, e.g. production deployments. Note that separate threads are used only for distinct technology–model combinations. Multiple instances of the same technology–model combination are NOT initialized in parallel. The number of threads used for the multi-threaded initialization can be configured using server.technology_multithread_initialization.n_threads setting. Default value is 0, which determines the number of threads automatically according to number of…

Speech To Text / Keyword Spotting supported languages

…(Ukraine) UK_UA_6 2023-04 8th gen. Standard Vietnamese (Vietnam) VI_VN_6 2021-10 8th gen. Standard Deprecated languages/models (not supported, after end-of-life) Older/other languages or models not listed in the above table are no longer supported and reached end-of-life. These are 1st, 2nd, 3rd or 4th generation models, typically marked with a number 1, 2, 3 or 4 in the model name. …

Understand SPE configuration

…0022 Data storage and multithread settings The home directory of SPE contains all user data including audio recordings and metadata files from speech processing (speaker models, description etc.). This is another good example of using environment variables if your topology design requires multiple instances of SPE processing the same payload. This is great for sharing raw data between multiple physical…

KWS: Results explained

…before the keyword (1), the Keyword model (2) and a Background model of any speech parallel with the keyword model (3). Models 2 and 3 produce two likelihoods – Lkw and Lbg (any speech = background). Raw score is calculated as log likelihood ratio (LLR): score = loge(Lkw/Lbg) Confidence is calculated from the raw score using a sigmoid function: where:…

Speaker Identification (SID)

…technological model and can range from 5 to 50 times faster than real time on 1 server CPU core. Voiceprint extraction is the most time-consuming part of the process. Voiceprint comparison, on the other hand, is extremely fast – a millions of voiceprint comparisons can be done in 1 second. Voiceprint extraction (Speaker enrollment) Speaker enrollment starts with the extraction…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

Language Identification (LID)

…Routing particular calls (languages) to human operators (language experts) Scoring and results The LID language pack defines a set of recognizable languages (represented by a language models). When identifying the language in audio recording (or languageprint), LID does the following: creates languageprint of the recording (if the input is audio recording) compares that languageprint with each language model in a…