Search: wer

36 results

Q: How to choose answer format from server (xml/json)?

A: Via HTTP header “Accept” parameter (application/json; application/xml) Via request query “format=json/xml” If the format is not defined (or the HTTP header “Accept” parameter has one of these values: application/*,*/*,*), server will return json….

Voice Inspector – supporting technologies

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

Releases and Changelogs (SPE)

…files results in premature end without error message + all changes included in Feature Preview releases 3.23 to 3.26 (see below) NOTE: STT output format has changed in 5th generation: _DELETE_ token was changed to <null/> _SILENCE_ and <sil/> tokens were changed to <silence/> <s> and </s> tokens were changed to <segment> and </segment> respectively Speech Engine 3.26 Speech Engine…

STT: Language Model Customization tutorial

…names, etc. Note: LMC works only with 5th or newer generation STT models. LMC is available as part of phxcmd command line tool (version 3.55 or newer), in older versions as part of Speech To Text package for command line (or as a separate download). Customizing STT language model 1) Creating word list Word list is UTF-8 encoded text file,…

Speaker Identification (SID)

…transition between lower and higher percentages, and only small differences in the low and high percentages lower sharpness means less steep function – i.e. the transition being more linear Note: It’s important to properly understand the correlation between score and confidence via the sigmoid function steepness, controlled by the sharpness value. To get a better idea about the correlation, check…

STT: What is Preferred Phrases feature and how to use it

…phrases list? Short answer: A “hints” for the system… a phrases, which will help the system to recognize the correct terms and transcribe them correctly… i.e. those which are expected in the text, but get mixed-up with other words and transcribed incorrectly. Longer answer: For example in voicebot implementation, the good candidates would be phrases extracted from utterances where the…

Understand SPE configuration file

…value is ‘${application.dir}shared’ server.shared.path = ${application.dir}shared Path to a directory intended to hold (customized) technology models shared by all SPE users. Defaults to shared subdirectory of SPE application directory and exists only in SPE 3.41 or newer. For additional details about shared models directory, see Understanding SPE directory structure article. NOTE: If you change the server.shared.path, you might also want…

Release Notes

…and fixes Speech Engine: General Reduced RAM consumption (since 3.58.0) RAM consumption can be up to several gigabytes lower, depending on technologies configuration and processed audio. This is mainly visible in Speech To Text when processing many audios or longer audios (or both). The effect may be less visible in other technologies. Fixed issues with non-ASCII / Unicode file names…

LID: Terminology and adaptation

…to train a language using just a few and long audio files (like 5 files, 1 hour each) Acoustic channels should be as close as possible to channel of intended deployment Adaptation using REST API (SPE 3.38 or newer) SPE 3.38 and newer include LID adaptation tasks in REST API, which makes the adaptation significantly easier than in previous versions….

Understand SPE database scripts

…scripts: vXXXX (e.g. v1601 or v1700, etc.) – scripts for SPE database initial setup and maintenance, i.e. when SPE is set up for the first time, or re-installed update – scripts for the SPE database content update, i.e. when SPE is updated to newer version database ├── MariaDB │ ├── update │ └── v1900 ├── SQLite │ ├── v1900 │…

Understand SPE connectors for external TTS

…} { “vendor”: string, “author”: string, “version”: string, “voices”: [ { “name”: string, “languageCodes”: [string, string, …] }, . . . ] } Where: apiVersion denotes version of the capabilities structure/API: 2: SPE 3.46 and newer apiVersion property not present at all for SPE 3.45 and older vendor is a name of the TTS provider This name is then used…

Designing and Developing Application

Before designing and developing the application, we encourage Partner to find clear answer for the following questions: Customer requirements: Do my customers need file processing (audio) or stream processing in real time? What is the human power of the customer that can analyze the results? How many minutes per day or streams in parallel do my customer need to process?…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…computing performance is better by ~17% compared with Intel® Xeon® E5 2860 v4 FtRTaudio shows that real requirements for HW and its computing power are approx. 62% lower than traditional approach using FtRTnet_speech for audio dataset with similar ratio between speech and non-speech (silence) and it is proven by measuring it. Best practices Use FtRTaudio when calculating hardware sizing and…

Phonexia Partner Program for Government Partners

…access to Phonexia Academy, state-of-the-art Phonexia technology, and an allocated senior consultant ensures that you can answer all your customer’s technical and business questions. Partner Level Prerequisites Silver Partner Gold Partner Basic Technical Knowledge X X Advanced Technical Knowledge X Active Partner Portal Account X X Quarterly Pipeline Updates X Frequently Asked Questions I am already a long-term Phonexia partner,…

Understand SPE technologies, instances and workers

Configuring Speech Engine to utilize effectively the full power of underlying hardware can get challenging – one can easily get lost in all the strange terms like technologies, instances, slots, or workers… This article should shed some light in it. Speech Engine is like post office Thinking about Speech Engine, there is actually a very nice analogy with post office…