Skip to content Skip to main navigation Skip to footer

Search: multi

41 results

Understand SPE technologies, instances and workers

…Different post offices may provide different (sets of) services – smaller offices may provide only small set of services, while big ones can have multiple floors with wide service portfolio. Different Speech Engine installations may provide different (sets of) technologies – smaller installations can have like only single technology configured, while big ones can have wide set of various technologies…

FAQs (Browser)

…this conversion automatically in background, see Understand SPE audio converter article. Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X. Example of usage: FFmpeg ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav This command converts any supported format/codec audio file to normalized WAV audio…

Understand SPE home directory

…location might be useful e.g. in complex deployments with multiple separate SPEs which need to be accessing single centralized file storage placed on high-performance networked disk array, etc. Similarly to the operating systems, the SPE home directory contains subdirectories for each SPE user (see SPE user management article). These subdirectories contain data belonging to the respective users: – user’s file…

Key Features (PSP)

…The Speech Platform includes the following technologies. Technologies are available in the Speech Engine component based on its particular configuration (Voice Biometrics, Transcription System, etc.) Speaker Identification (SID) – recognizes a speaker automatically based on their voice, Speaker Diarization (DIAR) – separates multiple speakers in mono audio automatically, Language Identification (LID) – detects the language or dialect spoken in a…

Q: What are the requirements for SID evaluation dataset?

…in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution) audio files should be mono, lin16 format, 8 kHz+ sample rate *Note: splitting single recording into multiple shorter…

Release Notes

…the important known issues we see and plan to work on: BROWSER: Only one VAD model presented even if multiple VAD models are available on SPE SPE: Preferred phrases work currently in CS_CZ_6 STT only – we will add it to other languages in upcoming updates   Release Plan for future For the next public release, we plan to: Upgrade…

Understand SPE technologies configuration file

This article explains the purpose and structure of SPE technologies configuration file technologies.xml, or technologies.json created by Phonexia Browser. SPE installation includes usually multiple speech technologies (e.g. Speaker Identification, Speech To Text, etc.) in various technological models (e.g. L4, XL4, etc.), or supporting various languages (e.g. 6th generation of EN_US, CS_CZ, etc.) available. You can select from these technologies/models those…

Understand SPE metafiles

…DELETE methods to upload, download or delete any kind of file with metadata of your choice, associated with the corresponding SPE entity. There are no limits on the content of the metafiles, their names, etc. (apart from those imposed by the underlying operating system and/or filesystem). Plain text files, structured formats like JSON or XML, pictures, documents, multimedia files… you…

Understand SPE benchmark

…lengths and speech/non-speech ratios it is recommended to run the benchmark using multiple different audio files and calculate the average FtRT processing speed yourself. Alternatively, you can tune (or hack) SPE and prepare your own, or replace the default set of benchmarking recordings – see further below… Benchmark recordings sets The default sets of audio files supplied with SPE are…

STT: Adding words to language model on the fly

…possible to define multiple pronunciations – this can be especially useful for uncommon or foreign words, slang words, etc. which people tend to mispronounce. Allowed characters In general, words should use using only letters (graphemes) allowed in the given STT language (use GET /technologies/stt/graphemes to get allowed graphemes list). However, it is actually allowed to use any letters, even from…

Input audio quality

…noise, reverberations, or artifacts caused by possible multiple recodings during transfer. Store the audio in appropriate format (see above), to avoid distorting the sound by compression artifacts. In general, the following recording methods or sources affect negatively the sound quality: surveillance camera microphone notebook built-in microphone smartphone lying on a desk, or even hidden under the desk, etc. hidden bug…