Skip to contentSkip to main navigation Skip to footer

Release Notes (PSP)

Speech Platform v3.45 release (Fall 2021)

Hello, and welcome to the page introducing the Fall 2021 release of Phonexia Speech Platform. We released two components:

  • Phonexia Speech Engine v3.45 (SPE 3.45.0 released 2021-10-06) – the encapsulation of speech technologies (REST API)
  • Phonexia Browser v3.45 (BROWSER 3.45.0 released 2021-10-08) – the client application on top of SPE (for technologies evaluation)

This page summarizes major changes, bug fixes, and known bugs. We also announce important future plans and the end of life here.

Major Changes: New Features and Fixes

Speech Engine: Speech to Text (STT)

We have several very interesting new features relevant to STT and KWS technologies. Both technologies are part of the Speech Engine (SPE) component:

  • English (United States) model (tech.model name: EN_US_6) released
    It is the upgrade of the existing STT/KWS language (EN_US_5) to the 6th generation (EN_US_6). It brings increased accuracy 8% (WAcc, absolute; model EN_US_6 vs. EN_US_5) of the STT that will lead to more precise search results in audio content.
  • Vietnamese (as spoken in Vietnam) model (tech.model name: VI_VN_6) released
    We add this to languages supported by STT/KWS. Partners/Customers can transcribe Vietnamese audio or enhance some search-in-audio applications. Again, it is the 6th generation we use in automatic speech recognition technologies (i.e., STT, KWS, PHNREC) that will lead to more precise search results in audio content.
  • Improved preferred phrases in STT– only in tech. model CS_CZ_6 (Czech)
    Custom words (not present in the baseline STT model – such as names, slang expressions, etc.) can now be used in preferred phrases. On top of that, this feature can replace the LMC functionality (add custom words dynamically with each transcription attempt with no permanent STT models created).
  • Improved transcription accuracy in the 6th generation of STT – only in tech. model CS_CZ_6 (Czech) and model EN_US_6 (English)
    Transcription accuracy was improved (thanks to the remastering of several internal components of STT) by more than 2% WER (absolute) on average compared to the original 6th generation model (SPE 3.40.5). An additional benefit is an improved transcription accuracy of the most important information (such as names and addresses). Improvements will be implemented into other 6th generation models/languages soon.
  • Language Model Customization (LMC) functionality added to SPE (production level BETA)
    LMC is now a native part of the Speech Engine.

Speech Engine: Other technologies

  • XL4 technology model added to GenderID
    It brings compatibility with SID4_XL4. It saves the processing time significantly in the integration, where the voiceprint is extracted (resources consuming) from audio only once and sent for comparison (fast) to both SID4_XL4 and GID_XL4.
  • VAD has been upgraded to a new generation (tech. model GENERIC_3)
    The model (GENERIC_3) was released for standalone Voice Activity Detection (VAD as part of SPE). It brings higher accuracy in such a fundamental task to recognize speech and non-speech (silence, ringing, etc.) correctly. Using this new generation in built-in VAD in STT CS_CZ_6 (Czech language), we see increased accuracy (WAcc) by approx. 2% absolute. The implementation into other tech. models of STT (i.e., languages) will follow. It does not influence the *ID technologies.
  • SQE: Added Perceptual Evaluation of Speech Quality (PESQ) score estimation
    The PESQ estimation was added as another available metric of SQE. PESQ is a standard way of expressing speech quality as perceived by human beings.
  • SQE: Real-time processing
    A new technology model SQE_STREAM was added for real-time quality estimation on streams.
  • Added Speaker Clustering endpoint for SID4 (SURPRISE of this release)
    Allows to compare a set of voiceprints and receive clusters of those. It will bring another level of effectiveness in the task of finding similar speakers. This functionality is available only for SID4 technology (tech. model XL4 (recommended default) or L4 (faster but less precise than XL4)).

Speech Engine: Generic functionalities

  • Custom request ID can be specified in the HTTP header X-Request-ID
    Useful for tracking down issues during application development
  • Possibility to set a source port for an output stream
    Useful when symmetric RTP communication is needed
  • Added /doc endpoint for serving REST API documentation in HTML format
    Get API documentation for your particular SPE version remotely, without physical access to SPE installation files

Phonexia Browser updates

We provide Phonexia Browser (a component of Speech Platform) for the basic evaluation of speech technologies. This is to help with the first use of our SPE component.

  • LID language models and language packs management in Browser
    It allows users to e.g. easily customize the set of languages in LID language packs. Customers will benefit from increased precision of results by lowering the false positive scores on customer data. Available for all LID technological models. See Browser manual PDF for more details about how to use it.

Deprecated Features

BSAPI (C++ API) discontinued – ANNOUNCEMENT

We set the End of Life for BSAPI (our C++ API) for​ 2023-03-31 after discussion with partners/customers, who actively gave us feedback on C++ API​.
What does it mean for partners/customers?:

  • Partners/customers with installed BSAPI version and valid Maintenance & Support can update to BSAPI v3.40.x (March 2021 release; x = latest)​ and ask for bug-fixes.​ BSAPI v.3.40.x to be bug-fixed in long-term (18+6 months (i.e., till 2023-03-31)) and BSAPI v3.45 was not released.​
  • New and existing partners can use Speech Engine (RESP API) or our command-line interface version (CLI/CMD) or GUI applications (Phonexia Browser or Voice Inspector).

End of Life for technical models

Several new technical models have been released. So in accordance with our Phonexia Product Support Lifecycle Policy, we announce the end of life for the following models:

technologymodel to be deprecatednote: latest model
STT + KWS + PHNRECEN4 (English (United States))EN_US_6 (English (United States))
STT + KWS + PHNRECHR_HR4 (Croatian (Croatia))HR_HR_6 (Croatian (Croatia))

Known Issues

Some of the important known issues we see and plan to work on:

  • BROWSER: Only one VAD model presented even if multiple VAD models are available on SPE
  • SPE: Preferred phrases work currently in CS_CZ_6 STT only – we will add it to other languages in upcoming updates

Release Plan for future

For the next public release, we plan to:

  • Upgrade the Spanish model (technologies STT / KWS / PHNREC, the expected model name ES_ES_6) – improved accuracy expected.
  • Add a new decoder and new VAD configuration to the existing STT technology model (i.e., languages in STT) – improved accuracy expected.

We also work hard on other products:

  • Phonexia Voice Verify – a voice verification solution for contact centers to enhance the security layer.
  • Phonexia Orbis – an on-premises software solution that enables the rapid investigation of audio recordings.

The next public release (v3.50) for the Speech Platform components (Speech Engine and Phonexia Browser) is planned for 30/3/2022. The diagram below shows the release plan.

 

Sources and How to Get Help

How to get help / Support updated – we updated the ticketing system for our partners/customers for easy bug reporting. The support is still available on the URL
partner.phonexia.com/support

Where to read more / Partner Portal updated – we updated the knowledge base information for you. You are more than welcome to surf through our knowledge base.

For a complete list of changes in SW, see Changelogs:

  • SPE → CHANGELOG.txt included in the distribution or on Partner Portal
  • BROWSER → CHANGELOG.txt included in the distribution or on Partner Portal

Privacy Preference Center

Necessary

Required cookies required for proper function of Word Press publication platform.

gdpr*, wordpress*,cf7*,wp-settings*,PHPSESSID

Analytics

We are using Google Analytic in Global Site Tag configuration for keeping site content optimized for great user experience. No personal data are sent.

_ga*,_gid