Table of Contents
Phonexia Speech Platform is provided as a set of several components:
- The Speech Engine (SPE) component is a REST API that includes technologies for the automated processing of audio files and audio streams. This component is usually provided in a specific configuration that meets the customer’s use case.
- The Phonexia Browser component is an expert-level application (on the top of the Speech Engine) with a graphical user interface for advanced evaluation.
- The Reporting and Licensing Server (RLS) component is dedicated to limiting the capacities as purchased by a customer (if it is an on-premises deployment without access to the internet) or gathering and calculating the pay-as-you-go amount (if it is a deployment with access to the internet).
Speech Technologies Available
The Speech Platform includes the following technologies. Technologies are available in the Speech Engine component based on its particular configuration (Voice Biometrics, Transcription System, etc.)
- Speaker Identification (SID) – recognizes a speaker automatically based on their voice,
- Speaker Diarization (DIAR) – separates multiple speakers in mono audio automatically,
- Language Identification (LID) – detects the language or dialect spoken in a recording,
- Speech to Text (STT) – several languages supported – converts speech into plain text (words or sentences) automatically,
- Keyword Spotting (KWS) – several languages supported – detects specific keywords/phrases automatically without conversion to text,
- Gender identification (GID) – identifies whether a speaker is male or female,
- Age Estimation (AGE) – estimates the speaker´s age group,
- Voice Activity Detection (VAD) – detects the audio part that contains voice,
- Speech Quality Estimation (SQE) – measures the quality of speech,
- Phoneme Recognizer (PHNREC) – several languages supported – converts speech into phonemes (written characters representing pronunciation),
- Waveform Denoiser (DENOISER) – automatically improves the audibility of speech for human listeners.
The LID, STT and KWS technologies support various languages as listed in the Languages Available section.
Supported Audio input
The Speech Engine server supports various audio formats as listed in API reference > Audio requirements.
It also supports the RTP/HTTP stream processing as listed in API reference > RTP/HTTP streams.
The Speech Engine allows the usage of some audio conversion tools. Tested with sox or ffmpeg. For the configuration of this functionality, see
Note: You should be aware that audio format conversion (e.g., if the original audio format is highly compressed) can decrease the accuracy of speech technologies.
Phonexia Speech Platform can be integrated into a partner’s application using the Speech Engine component (REST API).
Packages, Updates vs. Upgrades
Our packages follow the bug fix > updates > upgrades approach.
The Speech Platform package is available with a typical set of technologies for download here.
Some packages are distributed with a limited set of speech technologies or even completely without speech technologies.
Find out more information in Packages, Updates vs. Upgrades.
You might also want to browse our product support lifecycle policy to see which of our versions are supported and maintained.
A partner/customer needs to deploy capacity+licensing files to their deployment.
Several variants of licensing are available for our partners/customers in relation to the life stage:
1. Demo/Evaluation Stage
- Capacity – a user is given a small amount of capacity (usually capacity.dat is copied to RLS components) to test that the Speech Platform meets their use case.
- License – the „NET“ license type – the license (usually one license.dat is copied to all Speech Platform components) is validated on the Phonexia.com licensing server via the internet (no audio is sent to Phonexia).
2. Production Stage
This depends on a business agreement with Phonexia, usually one of the following:
- Pay-as-you-go solution (for the commercial market)
- Capacity & license is ensured by the RLS component, which is only to calculate the amount processed during a monthly period. The amount is reported by the RLS component (deployed on a customer’s side) to the RLS installed on the Phonexia side. The RLS requires capacity & license only for security reasons.
- Prepaid-amount–per–day solution (for the government market)
- Capacity – a user is given the purchased amount of capacity (usually capacity.dat is copied to the RLS component).
- License – the „HW“ license type – the license (usually one license.dat is copied to all Speech Platform components) is bound to an HW profile of the machine where the Speech Platform is deployed (an offline license).
- Project–based solution
- for big data–processing solutions, we have special variants of licensing. It is realized on a project basis. Please reach out to your Phonexia contact.
Note: Learn more about Licensing (technical details).