Skip to content Skip to main navigation Skip to footer

Search: engine mono stereo

69 results

Releases and Changelogs (SPE)

…recordings Fixed: Unable to initialize technologies when SPE is launched using UNC path on Windows Speech Engine 3.57 (Public release) Speech Engine 3.57.0, DB v1901, BSAPI 3.57.0 (2023-02-01) New: AGE XL4 and XL5 models (for compatibility with SID4 XL4 and XL5 voiceprints) Speech Engine 3.56 (Public release) Speech Engine 3.56.0, DB v1901, BSAPI 3.56.0 (2022-12-15) New: GID XL5 model (for…

Input audio quality

…with speech in mind) Lossy MP3 format is not preferred. If MP3 really has to be used, it must use bitrates at least 32 kbit/s per channel. Stereo audio must use full stereo encoding, not joint-stereo1. Do not push for smallest possible audio file sizes, attempting to squeeze maximum number of recordings into a minimal storage space. Brutal compressions like…

Release Notes

…Speech Platform. We improved and released mainly Phonexia Speech Engine v3.50 (SPE, REST API). Major Changes: New Features and Fixes Speech Engine: Speech to Text (STT) We have several exciting new features relevant to STT and KWS technologies. Both technologies are part of the Speech Engine (SPE) component: Spanish (General) Model Released (Tech. Model Name: ES_6) It is an upgrade…

Time Analysis Extraction (TAE)

…or XML file. You can find information about monologues and conversations. Monologue Monologue section describes the statistics of a recording related to each channel. It answers following questions: how long only this speaker was talking alone how much of it was a net speech what was an average speed of the speech Conversation This section describes reactions of one channel…

Phonexia Speech Engine

…first steps with Speech Engine, after obtaining a license file from Phonexia and downloading the Speech Engine package using a link provided by Phonexia. https://www.youtube.com/watch?v=4qrB-GfFdWY In short, these are the steps as described in the tutorial: Unzip the package to a directory Copy license file into the same directory Run phxadmin –configure-tech in console to configure technologies Edit settings/phxspe.properties configuration…

FAQs (PSP)

…initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server. You can fix this by doing the following: Increase timeout in Settings > Speech Engine tab > First connection timeout Use fewer instances of technologies, thus letting the Speech Engine to start faster Use smaller models of technologies in FAQ Phonexia…

Understand SPE technologies, instances and workers

Configuring Speech Engine to utilize effectively the full power of underlying hardware can get challenging – one can easily get lost in all the strange terms like technologies, instances, slots, or workers… This article should shed some light in it. Speech Engine is like post office Thinking about Speech Engine, there is actually a very nice analogy with post office…

Key Features (PSP)

…The Speech Platform includes the following technologies. Technologies are available in the Speech Engine component based on its particular configuration (Voice Biometrics, Transcription System, etc.) Speaker Identification (SID) – recognizes a speaker automatically based on their voice, Speaker Diarization (DIAR) – separates multiple speakers in mono audio automatically, Language Identification (LID) – detects the language or dialect spoken in a…

LID: Terminology and adaptation

Engine chapter for details.   Using custom LID language pack in Speech Engine To use customized LID language pack in Speech Engine, it’s necessary to ensure that language pack placed in correct location, so that Speech Engine can find it register and enable the language pack in SPE using phxadmin 1) Put the language pack in correct location In order…

Understand SPE configuration file

In this article we explain details of the Speech Engine configuration file phxspe.properties, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, default configuration file phxspe.properties is created in the settings directory. The file…

SPE and Browser installation: standalone SPE

…package. In other words, merge the contents of the /bsapi/ directory with the /SPE/bsapi/ 4. Configure Speech Engine In order to configure the Speech Engine, we have to navigate to /SPE/ directory and start the configuration utility called phxadmin SPE on Windows In the /SPE/ directory type cmd in the Address bar, to open the Command line. In the command…

Understand SPE connectors for external TTS

…via the service native API and with SPE via standard input (stdin) and output (stdout). The connector behavior should be as follows: if connector is started with –info parameter, it outputs TTS service capabilities information data in JSON format to stdout if connector is started without parameter reads input JSON data from stdin outputs raw PCM signed 16-bit little-endian mono

STT: Language Model Customization tutorial

…copy of the word list file, as a backup) – see below for the best location for usage in Speech Engine Using customized STT model in Speech Engine STT To use customized STT model in Speech Engine STT, it’s necessary to place the customized model in correct location, so that Speech Engine can find it register and enable the customized…

Releases and Changelogs (Browser)

Phonexia Browser is a tool for testing Phonexia speech technologies available via Speech Engine API. Releases Version Release Date End of Support Maintained Until Release type 3.60 2023-12-05 2025-06-01 n/a Public 3.59 2023-06-20 2025-01-01 n/a Public 3.58 2023-04-03 2024-10-01 n/a Public 3.57 2023-02-02 2024-08-01 n/a Public 3.56 2022-12-15 2024-06-01 n/a Public 3.55 2022-10-03 2024-04-01 3.60 Public 3.52 2021-07-01 2021-09-30 3.55…

Speaker Diarization (DIAR)

Speaker Diarization labels segments of the same voice(s) in one mono-channel audio record based by the individual speaker´s voice. It is a language-, domain- and channel-independent technology. It performs not only the segmentation of speakers but of technical signals and silence as well. The outputs of the technology can be both log files with labels and/or split audio files/one new…