Skip to content Skip to main navigation Skip to footer

Releases and Changelogs (SPE)

Speech Engine (SPE) is developed as RESTful API on top of Phonexia BSAPI.

Releases

Version Release Date End of Support Maintained Until Release type
3.59 2023-06-20 2024-04–01 3.60 Public
3.58 2023-04-03 2024-04–01 3.60 Public
3.57 2023-02-01 2024-04-01 3.60 Public
3.56 2022-12-15 2024-04-01 3.60 Public
3.55 2022-10-03 2024-04-01 3.60 Public
3.52 2022-07-01 2022-09-30 3.55 Feature
3.51 2022-06-14 2022-09-30 3.55 Feature
3.50 2022-03-23 2023-10-01 3.55 Public
3.46 2022-02-07 2022-04-01 3.50 Feature
3.45 2021-10-06 2023-05-01 3.50 Public
3.42 2021-08-24 2021-09-30 3.45 Feature
3.41 2021-07-15 2021-09-30 3.42 Feature
3.40 2021-03-26 2022-10-01 3.45 Public
3.38 2021-02-25 2021-03-30 3.40 Feature
3.37 2021-02-17 2021-03-30 3.38 Feature
3.36 2020-12-01 2021-03-30 3.37 Feature
3.35 2020-10-01 2022-05-01 3.40 Public
3.32 2020-08-28 2020-09-30 3.35 Feature
3.31 2020-07-01 2020-09-30 3.32 Feature
3.30 2020-03-27 2022-04-01 3.35 Public
3.26 2020-03-02 2022-04-01 3.30 Feature
3.25 2020-01-31 2022-04-01 3.26 Feature
3.24 2020-12-18 2022-04-01 3.25 Feature
3.23 2020-11-01 2022-04-01 3.24 Feature
3.18 2019-10-01 2022-04-01 3.19 Public
3.17 2019-06-28 2021-12-28 3.18 Public
3.16 2019-04-26 2021-10-26 3.17 Public
3.15 2019-02-28 2021-08-28 3.16 Public
3.14 2018-12-21 2020-06-21 3.15 Public
3.13 2018-11-19 2020-05-19 3.14 Public
3.12 2018-08-17 2020-02-17 3.13 Public
3.11 2018-03-15 2019-09-15 3.12 Public
3.10 2017-12-06 2019-06-06 3.11 Public
3.9 2017-09-08 2019-03-08 3.10 Public
3.8 2017-06-26 2018-12-26 3.9 Public
3.7 2017-03-27 2018-09-27 3.8 Public
3.6 2016-12-14 2018-06-14 3.7 Public
3.5 2016-10-04 2018-04-04 3.6 Public
3.4 2016-09-19 2018-03-19 3.5 Public
3.3 2016-07-11 2018-02-11 3.4 Public
3.2 2016-04-22 2017-10-22 3.3 Public
3.1 2016-02-15 2017-08-15 3.2 Public
3.0 2016-02-09 2017-08-09 3.1 Public
2.1 2015-09-16 2017-09-16 2017-09-16 Public
2.0 2015-01-06 2016-07-06 2.1 Public

Changelogs

Speech Engine 3.59 (Public release)

Speech Engine 3.59.0, DB v1901, BSAPI 3.59.0 (2023-06-20)

  • New: PESQ estimation in SQE now considers only voice segments (via newly added VAD)
  • New: Grammar rules for words-to-numbers conversion added to ES, EN_US, EN_US_A and PL_PL 6th generation STT models
  • New: Empty grammar rules for words-to-numbers conversion added to all remaining 6th generation STT models
  • New: Added changelogs for STT models (located in bsapi/stt/data/models_<model> directory)
  • Improved: Updated language models of 6th generation STT
    • RU_RU_6 (version 6.1.0)
    • RU_RU_A_6 (version 6.1.0)
  • Fixed: *.bs.usr configuration files are ignored by phxadmin and phxadmin2
  • Fixed: ❗❗❗ Unable to open audio files with non-ASCII/Unicode characters in the name or path on Windows
Speech Engine 3.58 (Public release)

Speech Engine 3.58.0, DB v1901, BSAPI 3.58.0 (2023-04-03)

  • New: Added 6th generation models for STT and KWS
    • UK_UA_6 (Ukrainian)
    • SR_RS_6 (Serbian)
  • New: Added VAD SID4_XL5 model, which detects speech/non-speech exactly the same way as SID4 XL5 model does
  • Improved: Reduced memory consumption by tuning ONNX runtime parameters. Mostly visible when processing (many) long recordings in STT.
  • Improved: SID4 comparator now returns actual score for recordings containing less than 3 seconds of speech (it returned score -9999 in versions 3.55 to 3.57)
    NOTE: It is still strongly recommended to NOT rely on results from such very short recordings. Only longer recordings give results with appropriate confidence.
  • Improved: Better Windows version detection
  • Fixed: Generic model for SQE may fail when processing very short recordings
  • Fixed: Unable to initialize technologies when SPE is launched using UNC path on Windows
Speech Engine 3.57 (Public release)

Speech Engine 3.57.0, DB v1901, BSAPI 3.57.0 (2023-02-01)

  • New: AGE XL4 and XL5 models (for compatibility with SID4 XL4 and XL5 voiceprints)
Speech Engine 3.56 (Public release)

Speech Engine 3.56.0, DB v1901, BSAPI 3.56.0 (2022-12-15)

  • New: GID XL5 model (for compatibility with SID4 XL5 voiceprints)
  • Fixed: Incorrect timestamp values in STT N-best results of stream transcription
  • Fixed: Training of LID may get stuck in infinite loop in some cases
Speech Engine 3.55 (Public release)

Speech Engine 3.55.1, DB v1901, BSAPI 3.55.1 (2022-11-09)

  • New: Added 6th generation models for STT and KWS
    • KK_KZ_6 (Kazakh)
    • BN_6 (Bengali)
  • Fixed: phxcmd lpextract with -archive parameter and input directory creates empty archive
  • Fixed: Misleading error message if a HW license with an invalid HW profile is supplied

Speech Engine 3.55.0, DB v1901, BSAPI 3.55.0 (2022-10-03)

  • New: Added SID4 model XL5 with optional backwards compatibility with model XL4
  • New: Added phxcmd[.exe] commandline interface for technologies
  • New: Added 6th generation models for STT and KWS
    • DE_DE_6
    • NL_6
    • KA_GE_6
  • Improved: Updated language models of 6th generation STT
    • CS_CZ_6 (version 6.5.2)
    • SK_SK_6 (version 6.1.1)
  • Fixed: Cannot create Audio Source Profile in some circumstances when using only user calibration
  • Fixed: Possible deadlock in TTS output stream
  • Fixed: Corrupt audio file will crash the VAD technology and cause an Unexpected exception: null pointer error for further processing
  • Fixed: Errors like Could not parse word class definitions due to unnecessarily required write permissions for certain files
  • Removed: Removed old SID model S
  • + all changes included in Feature Preview releases 3.51 and 3.52 (see below)

NOTE: Windows 7 / Windows Server 2008 R2 and older are no longer supported!

Speech Engine 3.52

Speech Engine 3.52.0, DB v1901, BSAPI 3.52.0 (2022-07-01)

  • New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
    • IT_IT_6
    • RU_RU_6
  • Fixed: Significantly improved speed of diarization model XL4 on longer audios
  • Removed: Removed old STT/KWS model for IT_IT
Speech Engine 3.51

Speech Engine 3.51.0, DB v1901, BSAPI 3.51.0 (2022-06-14)

  • New: Added option to set number of workers and RTP input streams limit automatically
  • New: Added “Settings Adviser” feature, which checks technologies and workers configuration against current CPU and eventually suggests (in log messages) changes for optimal performance
  • New: Added 6th generation of ZH_CN STT, KWS and PHNREC (with features of CS_CZ_6: new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
  • Fixed: Use even ports for input RTP streams (according to RFC 3550) instead of odd ports
  • Fixed: Wrong detection of OS version on newer Windows
  • Changed: Windows version now requires Universal C Runtime (UCRT) installed (normally installed as part of Windows Update)
  • Changed: Logging is set to both file and console by default (this might be undesired for running SPE as service, i.e. changing the logging setting might be wise)
  • Changed: Number of workers and RTP input streams limit are set automatically by default
  • Changed: Require initialization of all configured technologies by default
  • Removed: Dropped support for Windows 7 / Windows Server 2008 R2 and older
  • Removed: Removed 4th generation of STT/KWS model for PL_PL
Speech Engine 3.50 (Public release)

Speech Engine 3.50.7, DB v1901, BSAPI 3.50.6 (2022-08-25)

  • Fixed: Processing of audio files bigger than 1 GiB stops prematurely with empty result

Speech Engine 3.50.6, DB v1901, BSAPI 3.50.5 (2022-07-29)

Speech Engine 3.50.5, DB v1901, BSAPI 3.50.5 (2022-07-11)

  • Fixed: Timestamps in stream STT may not be in chronological order, resulting in decreased accuracy

Speech Engine 3.50.4, DB v1901, BSAPI 3.50.4 (2022-05-10)

  • Fixed: Processing slowdown and high CPU usage on Windows platform – the following technologies/models used multiple threads when they should not:
    STT/KWS – all 6th generation models
    LID – L4 model
    DIAR, GID, SID4 – XL4 model
    SQE – GENERIC model
    VAD – GENERIC3 model

Speech Engine 3.50.3, DB v1901, BSAPI 3.50.3 (2022-04-28)

  • New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
    • PL_PL_6
    • AR_KW_6

Speech Engine 3.50.2, DB v1901, BSAPI 3.50.2 (2022-04-15)

  • Fixed: Licensing subsystem fails to get license when multiple applications run under different OS user accounts

Speech Engine 3.50.1, DB v1901, BSAPI 3.50.1 (2022-04-05)

  • Fixed: Wrong pronunciations of some foreign words in STT model CS_CZ_6
  • Fixed: SPE may respond with STT result version 5 instead of version 6 if result was found in cache
  • Fixed: Hardware profile file contains extra NULL character

Due to change in STT results content, all STT results are removed from cache (database) during update!

Speech Engine 3.50.0, DB v1900, BSAPI 3.50.0 (2022-03-23)

  • New: Support for word classes in STT preferred phrases (currently in CS_CZ_6 only)
  • New: phxadmin2 was promoted from BETA to production
  • New: phxadmin2 automatically renames renamed technologies or models, and shows information about renamed technologies or models
  • Improved: Updated VAD GENERIC_3 model
  • Improved: Updated following 6th generation models for STT and KWS (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
    • VI_VN_6
    • FR_FR_6
    • CS_CZ_6 (updated VAD, tuned LM)
    • ES_6 (STT only)
    • EN_US_A_6 (STT only)
  • Fixed: Bad example voiceprints in SID4 L4 model
  • Fixed: STT grapheme checking inconsistent behavior
  • Fixed: STT NL_NL_5 (possibly also other 5th gen models) is much slower in 3.45 than in 3.30
  • Fixed: Missing more specific error messages when opening file fails on Windows platform
  • Removed: Old S and L models for AGE technology (now only XL and L4 are supported)
  • Removed: Old S and L models for LID technology (now only L3, XL3 and L4 are supported)
  • + all changes included in Feature Preview release 3.46 (see below)
Speech Engine 3.46

Speech Engine 3.46.0, DB v1900, BSAPI 3.46.0 (2022-02-07)

  • New: STT: Single word in preferred phrase is now preferred in any sentence context (previously it was preferred only in single-word utterances)
  • New: STT: Faster initialization of “dynamic adding of words” feature (now ~0.1 s, before ~0.5 s)
  • New: phxadmin2 added to installation manual
  • New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
    • TR_TR_6
    • SK_SK_6
    • FA_6
  • Improved: Models for STT and KWS, updated and aligned with CS_CZ_6 features (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
    • AR_XL_6
    • SV_SE_6
    • HR_HR_6
    • EN_US_6
    • PS_6
  • Fixed: Dynamic adding of words in STT does not support UTF-8 characters wider than 16 bit
  • Fixed: STT crashes on zero probability in LM
  • Fixed: Invalid products in the license file are skipped instead of causing an error
  • Changed: MySQL database is no longer supported in favor of MariaDB (more details about migration in doc/UPDATE.txt)
  • Changed: Better phxadmin2 usage descriptions
  • Changed: phxadmin2 --version output format (3.45.0, 6118, 2021-10-07)
  • Changed: phxadmin2 technology show now uses colon to separate technology and model (consistent with “technology enable”)
  • Removed: user setting for maximum number of pending requests (the setting was ignored anyway)
Speech Engine 3.45 (Public release)

Speech Engine 3.45.7, DB v1701, BSAPI 3.45.8 (2022-05-06)

  • Fixed: Processing slowdown and high CPU usage on Windows platform – the following technologies/models used multiple threads when they should not:
    STT/KWS – all 6th generation models
    LID – L4 model
    DIAR, GID, SID4 – XL4 model
    SQE – GENERIC model
    VAD – GENERIC3 model

Speech Engine 3.45.6, DB v1701, BSAPI 3.45.7 (2022-04-14)

  • Fixed: Licensing subsystem fails to get license when multiple applications run under different OS user accounts

Speech Engine 3.45.5, DB v1701, BSAPI 3.45.6 (2022-02-22)

  • Fixed: Some STT models may fail to initialize with BsapiException: SPhxBasicDecoderI(2): <unspecified file>: cannot open file error
  • Fixed: Exception during license acquisition: License system failure (1303) on Windows with NET or FLS-distributed licenses when license expiration was in year 2038 or later

Speech Engine 3.45.4, DB v1800, BSAPI 3.45.5 (2022-01-14)

 ❗❗❗ STT users are strongly encouraged to update ❗❗❗ 

  • Fixed: Gradual speed drop and memory leak in STT
  • Fixed: Words-to-numbers conversion significantly decreases STT performance

Speech Engine 3.45.3, DB v1800, BSAPI 3.45.4 (2021-12-08)

  • Fixed: KWS fails with “Can not create temporary file” exception

Speech Engine 3.45.2, DB v1800, BSAPI 3.45.3 (2021-11-29)

  • Fixed: LMC does not work with 6th generation of Czech and Spanish models
  • Fixed: More specific error messages when opening file fails
  • Changed: Spanish 6th generation of STT/KWS renamed from ES_ES_6 to ES_6
    (incorrect name was used in SPE 3.45.1)

Speech Engine 3.45.1, DB v1800, BSAPI 3.45.1 (2021-11-22)

  • New: Added 6th generation of EN_US KWS/PHNREC
  • New: Added 6th generation of ES_ES STT, KWS and PHNREC
  • Fixed: Memory leak in STT CS_CZ_6 model
  • Fixed: STT CS_CZ_6 with preferred phrases reports class words as OOV (out of vocabulary)
  • Fixed: STT returns error on models which don’t support preferred phrases even if phrases were not specified
  • Fixed: STT slowdown if “words to be added to language model” are not specified
  • Fixed: KWS sometimes saves keyword list with minus infinity log probability
  • Fixed: PESQ score in SQE is not always in range <-0.5, 4.5>
  • Fixed: DIAR XL4 incorrectly detects various technical signals and noises as a speaker
  • Fixed: TTS “info” output lists voices twice
  • Fixed: Output stream does not accept “localhost” as destination address on some OSs

Speech Engine 3.45.0, DB v1800, BSAPI 3.45.0 (2021-10-06)

  • New: Added 6th generation of EN_US and EN_US_A STT (KWS/PHNREC will be added in one of the upcoming updates)
  • New: Added XL4 model for GID (for compatibility with SID4 XL4 voiceprints)
  • New: STT preferred phrases v2 with ability to dynamically add words to language model (currently in CS_CZ_6 only)
  • New: Endpoint /technologies/speakerid/clustervpset for clustering voiceprint set
  • New: Input streams over WebSocket (see GET /input_stream/websocket)
  • New: SQE: Added enable_pesq switch for Perceptual Evaluation of Speech Quality (PESQ) score estimation (PESQ is turned off by default for performance reasons)
  • Fixed: Empty “info” in VAD result when recording contains 0 seconds of speech for model GENERIC_3
  • Fixed: Incorrect timestamps in PHNREC results
  • Fixed: Segmentation fault when dynamically changing preferred phrases with new STT decoder (new decoder is currently used only in CS_CZ_6)
  • Fixed: Word separator is considered an invalid grapheme for CZ models in LMC
  • Improved: RLS-related messages are now logged at “debug” level, not “trace” level
  • Changed: STT language model customization marked as BETA
  • Removed: 4th generation of STT/KWS/PHNREC model for HR_HR
  • + all changes included in Feature Preview releases 3.41 and 3.42 (see below)
Speech Engine 3.42

Speech Engine 3.42.0, DB v1701, BSAPI 3.42.1 (2021-08-24)

  • New: Added /doc endpoint for serving REST API documentation in HTML format
  • New: New VAD model GENERIC_3 with improved accuracy + new VAD for 6th generation of CS_CZ STT, KWS and PHNREC
  • New: Added 6th generation of VI_VN STT, KWS and PHNREC
  • Fixed: New decoder does not propagate error messages
  • Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document for 6th generation of VI_VN
  • Improved: Updated decoder in 6th generation of CS_CZ STT, which should slightly increase recognition precision

Known issues:

  • When using preferred phrases containing some of the class words with 6th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored
  • New VAD model GENERIC_3 does not work in VAD_STREAM technology
Speech Engine 3.41

Speech Engine 3.41.0, DB v1701, BSAPI 3.41.0 (2021-07-15)

  • New: STT language model customization (LMC) via REST API (see Usage examples -> Speech To Text -> Create customized model in API documentation)
    NOTE: customized model is placed to shared directory, see more info in the SPE directories article.
  • New: Request ID can be specified in HTTP header X-Request-ID
  • New: Possibility to set source port for output stream
  • New: Added SQE technology on stream
  • New: Added Perceptual Evaluation of Speech Quality (PESQ) score estimation to SQE results
  • New: Following word classes are transcribed more accurately in 6th generation of CS_CZ STT
    • male/female first name and surname
    • municipality
    • street
  • Fixed: LMC may use wrong paths on Windows platform
  • Improved: Removed + symbol from LMC phrases in STT output
  • Improved: Updated decoder in 6th generation of CS_CZ STT, which should slightly increase recognition precision

Known issue: When using preferred phrases containing some of the class words with 6th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored.

Speech Engine 3.40 (Public release)

Speech Engine 3.40.10, DB v1701, BSAPI 3.40.11 (2022-02-21)

  • Fixed: Exception during license acquisition: License system failure (1303) on Windows with NET or FLS-distributed licenses when license expiration was in year 2038 or later

Speech Engine 3.40.9, DB v1701, BSAPI 3.40.10 (2022-01-13)

 ❗❗❗ STT users are strongly encouraged to update ❗❗❗ 

  • Fixed: Voices in TTS info output were listed twice
  • Fixed: Gradual speed drop and memory leak in STT
  • Fixed: Words-to-numbers conversion significantly decreases STT performance

Speech Engine 3.40.8, DB v1701, BSAPI 3.40.5 (2021-08-18)

  • Improved: Better audio resampler in player (/utils/player/output_stream) and TTS (/external/technologies/tts/*) for better audio quality output
  • Fixed: phxadmin2 error when disabling technology and specifying technology name twice
  • Fixed: Language name is truncated in LID result when name contains space character
  • Fixed: Fixes and improvements in numeric grammar for STT SK_SK_5 (words not converted to numbers in various cases)

Speech Engine 3.40.7, DB v1701, BSAPI 3.40.4 (2021-06-30)

  • Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.40.6, DB v1701, BSAPI 3.40.4 (2021-06-22)

  • Fixed: Getting information about the language model containing the LPA caused an internal server error
  • Fixed: Acapela connector works again (was broken in 3.40.4)
  • Fixed: Fixes from 3.35.8 (MySQL database schema update required)

Speech Engine 3.40.5, DB v1700, BSAPI 3.40.4 (2021-05-09)

  • Fixed: When trying to register webhook over existing webhook for any stream technology, SPE returns HTTP 400 (1069) error instead of HTTP 500
  • Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.40.4, DB v1700, BSAPI 3.40.4 (2021-05-28)

  • Fixed: BSAPI 3.40.3 does not include fixes from 3.40.2
  • Fixed: Different results in LID L4 for waveform and languageprint input
  • Fixed: Requested segment is out of waveform range error in TAE
  • Fixed: End time may be before start time in STT “one best” transcription
  • Fixed: When creating a new LID language pack, hash of the file contained in the custom language pack report is incorrectly calculated (occurs mainly in Windows)
  • Fixed: Items builtin_language_models and custom_language_models in a body of POST /technologies/languageid/languagepacks/{name} are now optional. At least one of them must not be empty.
  • Fixed: Better server response message when language model was not found during creation of new LID language pack
  • Fixed: Minor bugs in licensing subsystem

Speech Engine 3.40.3, DB v1700, BSAPI 3.40.3 (2021-05-12)

  • New: Added 6th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE of STT, KWS and PHNREC with improved accuracy
  • Fixed: Various log and error messages fixed
  • Fixed: Acapela TTS connector puts incorrectly named item languages in output JSON
  • Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document with phonemes for 6th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE

Speech Engine 3.40.2, DB v1700, BSAPI 3.40.2 (2021-04-30)

  • Fixed: LMC does not work with CS_CZ_6 online (stream) configuration
  • Fixed: Sample rate in Opus files is incorrect
  • Fixed: Various “[ERRFMT]” log messages fixes

Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16)

  • Fixed: 6th generation STT/KWS stream result may start with words from end of previous stream
  • Fixed: Some licensing error messages are not shown in log
  • Fixed: Missing file names in log messages in SID and SID4 tasks
  • Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used
  • Fixed: phxdamin2 cannot configure VAD_STREAM technology
  • Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf

Speech Engine 3.40.0, DB v1700, BSAPI 3.40.0 (2021-03-26)

  • New: Added 6th generation of CS_CZ  of STT, KWS and PHNREC with improved accuracy
  • Changed: Using new licensing system under the hood (internal change)
    • NOTE: When using SPE with FLS (Floating License Server), you need to upgrade FLS to version 2.x in order to be able to use SPE 3.40+ with FLS.
  • + all changes included in Feature Preview releases 3.36, 3.37 and 3.38 (see below)

Known bug: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used. There is no problem when using JSON as input.

Speech Engine 3.38

Speech Engine 3.38.0, DB v1700, BSAPI 3.38.0 (2021-02-25)

  • New: Training of LID Language Packs (no more need for command line tools… finally!)
  • New: LID Language Packs allow to store meta-files
  • New: New entity “LID Language Model” (equivalent of *.lpa LanguagePrint Archive)
  • Improved: Updated STT model RU_RU_A to version 4.6.0 of (updated language model)
  • Removed: Support for RLS-enforced licences in command line applications
  • Removed: FeaturePasterRepeat warning on null/empty repeat vector
Speech Engine 3.37

Speech Engine 3.37.1, DB v1601, BSAPI 3.37.0 (2021-02-18)

  • Fixed: Missing phxadmin2 tool in the Windows package

Speech Engine 3.37.0, DB v1601, BSAPI 3.37.0 (2021-02-17)

  • New: New administration tool phxadmin2, allowing to perform phxadmin actions non-interactively, e.g. from scripts
  • New: Added 5th generation of PS (Pashto) of STT, KWS and PHNREC
  • Fixed: Internal subsystems are uninitialized in reverse order than it should be
  • Fixed: Creation of SID4 audio source profile fails if path parameter is empty
  • Improved: Better log message when switching to webhook
  • Improved: Debug log level now shows task start and finish messages
Speech Engine 3.36

Speech Engine 3.36.0, DB v1601, BSAPI 3.35.3 (2020-12-01)

  • New: Added some useful information to log messages:
    • Stream ID in task-related log messages
    • Audio length in debug log messages
    • Workers and streams info in debug log messages
  • New: Possibility to obtain information about input RTP connection (see GET /input_stream/rtp/info)
  • New: Endpoint to get languageprint information (see POST /technologies/languageid/lpinfo)
  • Improved: Result of languageprint extraction now contains speech length for each languageprint (see GET /technologies/languageid/extractlp)
  • Improved: Output RTP packet payload size changed from 480 to 160 bytes
  • Fixed: SSRC in output RTP packet is now set to random 32-bit value
  • Fixed: RTP packets with payload type >=95 in input RTP streams are now ignored
Speech Engine 3.35 (Public release)

Speech Engine 3.35.9, DB v1602, BSAPI 3.35.5 (2021-06-30)

  • Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.35.8, DB v1602, BSAPI 3.35.5 (2021-06-21)

  • Fixed: Race condition in speaker models may lead to inconsistency in database, causing e.g. “Extraction error: value already extracted” exception (MySQL database schema update required)
  • Fixed: Prevent creating a duplicate speaker model (or calibration set, audio source profile) with a different letter case in the name

Speech Engine 3.35.7, DB v1601, BSAPI 3.35.5 (2021-05-09)

  • Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.35.6, DB v1601, BSAPI 3.35.5 (2021-03-24)

  • Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.35.5, DB v1601, BSAPI 3.35.4 (2021-02-22)

  • Fixed: Creation of SID4 audio source profile fails if path parameter is empty
  • Improved: Better log message when switching to webhook
  • Improved: Debug log level now shows task start and finish messages

Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14)

  • Fixed: STT/KWS model AR_XL_5 has incorrect name and does not start
  • Fixed: Missing KWS model AR_XL_5
  • Fixed: Processing of some short recordings causes TwoGmmCalibThreshold is not finite error
  • Fixed: STT preferred phrases “out of vocabulary” (OOV) warning message is now more verbose

Speech Engine 3.35.3, DB v1601, BSAPI 3.35.3 (2020-11-24)

  • New: Internal support for SAMPA phonetic alphabet
  • New: Updated STT model RU_RU_A to version 4.5.0 of (updated language model)
  • New: Updated STT/KWS/PHNREC model AR_XL to version 5.2.0 (updated language model, changed phonemes notation to X-SAMPA)
  • Fixed: Cannot create new output stream due to hanging unfinished tasks
  • Fixed: Task is not removed from pool when result is delivered via Webhook
  • Fixed: Some log messages contain format placeholder instead of numbers
  • Fixed: Missing <silence/> label in STT confusion network output
  • Fixed: STT confusion network contains <silence/> tags with confidence greater than 1.0
  • Fixed: Diarization crashes during processing
  • Fixed: Diarization XL4 crashes on file with no speech
  • Fixed: SID voiceprint extraction on stream is affected by previous run
  • Fixed: Incorrect number of LID L4 languages in documentation
  • Improved: Database drop scripts
  • Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf

Speech Engine 3.35.2, DB v1600, BSAPI 3.35.2 (2020-10-22)

  • Fixed: detection of certain USB license tokens

Speech Engine 3.35.1, DB v1600, BSAPI 3.35.1 (2020-10-13)

  • Fixed: Missing input stream task name in log messages
  • Fixed: Missing arguments in “word not found” error messages (when using preferred phrases)
  • Changed: Configurable STT Confusion Network threshold min_word_posterior_probability changed from log probability to normal probability (i.e. the value visible in Confusion Network results)

Speech Engine 3.35.0, DB v1600, BSAPI 3.35.0 (2020-10-01)

  • New: LID model L4 was promoted to production (LID BETA_L4 renamed to LID L4)
  • New: Added new language tag documentation (doc/Technology_LID_L4_Language_tags.pdf)
  • New: Updated STT model CS_CZ_5 to version 5.2.1 (fixes faulty transcription of numbers into Roman format)
  • New: Added configurable STT Confusion Network threshold (in technology configuration file)
  • Fixed: STT didn’t work with 4th and older generation models after introduction of the Preferred phrases feature in SPE 3.32
  • Fixed: Update from SPE 3.30 causes errors in STT result cache
  • Fixed: memory leak in logging system
  • Fixed: Typo in name of es-XA language in LID model L4 default language pack (es-XA7 -> es-XA)
  • Fixed: Time Analysis segfaults on audio with 3+ channels
  • Fixed: vpextract_s_calib.bs config file not working
  • Fixed: WebSocket reply to PING control frame does not follow the protocol specification
  • + all changes included in Feature Preview releases 3.31 and 3.32 (see below)

NOTE: Due to the change in STT results content, all STT results will be removed from cache (database) during update!

Speech Engine 3.32

Speech Engine 3.32.0, DB v1500, BSAPI 3.32.0 (2020-08-28)

  • New: Added support for Webhooks and WebSockets in stream processing
  • New: Added support for preferred phrases in 5th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream)
  • New: Added possibility to get multiple STT result types at once using single request (result_type query parameter now supports multiple values)
  • New: Added phrase start- and end times in STT “n-best” result
  • New: Added new Diarization model XL4
  • Fixed: Results of STT stream and SID/SID4 stream voiceprint do not contain task ID, stream ID and task execution time
Speech Engine 3.31

Speech Engine 3.31.2, DB v1500, BSAPI 3.31.0 (2020-08-17)

  • Fixed: MySQL session is not returned to the session pool if RELOAD privilege is not granted in the database, which leads to exhausting of all sessions and server subsequently stops working

Speech Engine 3.31.1, DB v1500, BSAPI 3.31.0 (2020-07-02)

  • Fixed: SQLite database update from version v1401 fails

Speech Engine 3.31.0, DB v1500, BSAPI 3.31.0 (2020-07-01)

  • New: SPE now requires CentOS 7 or other Linux based OS with glibc >= 2.17
  • New: Added instructions for updating SPE (see doc/UPDATE.txt file)
  • New: Added new LID model BETA_L4
  • New: Audio Source Profile can be now stored in SPE storage without the need for registration
  • Fixed: STT 5th generation confusion network output contains extra legacy _SILENCE_ tokens with weird timestamps
  • Fixed: Stream ID missing in debug log record
  • Fixed: SID4 cannot use Audio Source Profile created with different number of calibration chunks
  • Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf
  • Removed: Removed VBS plugin
  • Removed: Following STT models are obsolete and not available and supported anymore:
    CZ_PROMPT3, CZ_IT1, CZ_TELCO2, CZ2, CZ_ENERGY1, CZ_FIN1, SK_TELCO3, SK1, EN_L1, EN_N1, EN_GB2, EN_S1, ES_AMER1, FR1, RU_A7, RU7
Speech Engine 3.30 (Public release)

Speech Engine 3.30.14, DB v1401, BSAPI 3.30.14 (2021-03-24)

  • Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.30.13, DB v1401, BSAPI 3.30.13 (2020-09-11)

  • New: Updated STT and KWS model AR_XL to version 5.1.0

Speech Engine 3.30.12, DB v1401, BSAPI 3.30.11 (2020-08-17)

  • Fixed: MySQL session is not returned to the session pool if RELOAD privilege is not granted in the database, which leads to exhausting of all sessions and server subsequently stops working

Speech Engine 3.30.11, DB v1401, BSAPI 3.30.11 (2020-08-11)

  • Fixed: Words with probability lower than 0.01 are now not included in STT Confusion Network output (to remove “irrelevant clutter” from the output)

Speech Engine 3.30.10, DB v1401, BSAPI 3.30.10 (2020-07-29)

  • New: Updated STT model RU_RU_A to version 4.4.0

Speech Engine 3.30.9, DB v1401, BSAPI 3.30.9 (2020-07-01)

  • New: Added 5th generation of HR_HR (Croatian) of STT, KWS and PHNREC
  • Fixed: SPE crashes due to buffer overflow on corrupted recording

Speech Engine 3.30.8, DB v1401, BSAPI 3.30.8 (2020-06-16)

  • Fixed: STT failure during text-to-number translation in SK_SK_5 model

Speech Engine 3.30.7, DB v1401, BSAPI 3.30.7 (2020-06-03)

  • Fixed: Increasing memory consumption of SPE
  • Fixed: KWS delay for some 5th generation stream configurations

Speech Engine 3.30.6, DB v1401, BSAPI 3.30.6 (2020-05-22)

  • Fixed: New stream is counted towards running streams even if stream creation fails
  • Fixed: Incorrect start timestamps on silence tags in STT output
  • Fixed: Incorrect start timestamps on null words in STT confusion network output
  • Fixed: STT n-best output is missing channel info

Speech Engine 3.30.5, DB v1401, BSAPI 3.30.5 (2020-05-14)

  • New: Added new STT model EN_US_A_5
  • Fixed: Wrong example data in STT model EN_US_5
  • Fixed: Segmentation fault in G2P in KWS when no pronunciation was generated

Speech Engine 3.30.3, DB v1401, BSAPI 3.30.3 (2020-04-27)

  • Fixed: Corrected code to SV_SE for Swedish STT, KWS and PHNREC
  • Fixed: Invalid SQL statement: no such table error in SPE log when using SQLite after update to database schema v1300
  • Fixed: When task limit is reached, server now responds with HTTP status 503 Service Unavailable instead of 500 Internal server error

Speech Engine 3.30.2, DB v1400, BSAPI 3.30.2 (2020-04-23)

  • New: Added 5th generation of SE_SV (Swedish) of STT, KWS and PHNREC
  • Fixed: Playing TTS via output stream may not be smooth
  • Fixed: RTP output stream produces packets without timestamp which may cause problems with some RTP clients

Speech Engine 3.30.1, DB v1400, BSAPI 3.30.1 (2020-04-08)

  • Fixed: TTS Acapela connector does not work due to renamed parameters
  • Fixed: SPE fails to read reformatted but still valid technologies.xml
  • Fixed: Zero start- and end time stamps for “null” words in STT confusion-network output
  • Improved: Words in STT confusion-network are now sorted by confidence

Speech Engine 3.30.0, DB v1400, BSAPI 3.30.0 (2020-03-25)

  • New: Added 5th generation of FR_FR (French) of STT, KWS and PHNREC
  • New: Updated and significantly improved phonemes document for STT and KWS (see doc directory)
  • New: Added n-best output to all 5th generation STT stream results
  • New: Added support for native numbers and dates notation in n-best output in 5th generation CS_CZ and SK_SK STT (in both file- and stream processing)
  • New: Each request in SPE log gets unique ID, allowing better request tracing. Also HTTP status and REST error code is logged in case of error
  • New: Updated STT model RU_RU_A to version 4.3.0
  • Changed: All utterance_lenght parameters (introduced in 3.24) renamed to speech_length in endpoints returning voiceprint
  • Changed: Parameters languageCode and languageCodes (introduced in 3.25) renamed to language_code and language_codes in TTS endpoints
  • Changed: Parameter target (introduced in 3.25) in POST /external/technologies/tts query renamed to path
  • Improved: Better error message on upload/registering of new file when file cannot be opened
  • Fixed: Processing long files results in premature end without error message
  • + all changes included in Feature Preview releases 3.23 to 3.26 (see below)

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively
Speech Engine 3.26

Speech Engine 3.26.0, DB v1400, BSAPI 3.26.0 (2020-02-28)

  • New: Added new SID4 XL4 model
Speech Engine 3.25 (Public release)

Speech Engine 3.25.1, DB v1400, BSAPI 3.25.0 (2020-02-07)

  • New: Improved handling of “Accept” HTTP header for better CORS support
  • Fixed: TTS saves raw file and returns internal server error
  • Fixed: TTS connector gets stuck when recoding takes long time

Speech Engine 3.25.0, DB v1400, BSAPI 3.25.0 (2020-01-30)

  • New: Added input stream statistics to result of DELETE /input_stream/rtp call
  • New: Added support for CORS (can be enabled by server.cors_enable property)
  • New: Added Acapela TTS integration, see External Text To Speech (supported only in Linux SPE builds!)
Speech Engine 3.24

Speech Engine 3.24.0, DB v1400, BSAPI 3.24.0 (2019-12-10)

  • New: Significantly improved 5th generation STT stream performance
    • Added neural network based voice activity detection – improves the end-of-utterance detection
    • Decoder is now restarted after each segment – i.e. “word corrections’ never go beyond segment boundary
    • Added per-segment confidence, computed as an average of all word confidences in a sentence – helps in judging the results ‘credibility’
    • Reduced delay of obtaining results in output – allows for faster detection of barge-in, e.g. in voicebot application
  • New: All 5th generation STT models now use Minimum Bayes-Risk Decoding for Confusion Network construction
    • Confusion Network results now contain precise start- and end times for each individual alternative word
  • New: KWS confidence value calculation can be modified using confidence_shift and confidence_sharpness values (see KWS results explained article for more details)
  • New: Added utterance_length to SID/SID4 voiceprint results
  • New: Added /output_stream and audio file player (/utils/player/output_stream) endpoints
  • New: Added 5th generation of AR_XL (Arabic Levantine) (Beta version) of STT, KWS and PHNREC
    (combines both North- and South Levantine, hence the custom code AR_XL)
  • Changed: Changed endpoints, results and properties using the term ‘stream‘ to use ‘input_stream
  • Changed: Technology models named DEFAULT are renamed to GENERIC
    • stop SPE and then run phxadmin --configure-tech to automatically update affected technologies configuration
    • modify accordingly SPE REST API calls in your application, if applicable
  • Fixed: STT doesn’t work with models customized using LMC
  • Fixed: Incorrect end times for <segment/> token in STT results

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively
Speech Engine 3.23

Speech Engine 3.23.0, DB v1300, BSAPI 3.23.0 (2019-11-01)

  • Changed version to 3.23.0 to synchronize with BSAPI
  • Fixed: SPE sends IP address in Host: HTTP header instead of hostname
  • Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value
Speech Engine 3.18 (Public release)

Speech Engine 3.18.3, DB v1300, BSAPI 3.22.2 (2019-12-09)

  • Fixed: STT on stream may cause assert violation when waiting for stream timeout on no input data
  • Fixed: SPE sends IP address in Host: HTTP header instead of hostname
  • Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value

Speech Engine 3.18.2, DB v1300, BSAPI 3.22.1 (2019-10-14)

  • Fixed: Customized STT model fails on Windows with Request for next state but ending state reached. error message

Speech Engine 3.18.1, DB v1300, BSAPI 3.22.0 (2019-10-01)

  • New: DICTATE technology has been renamed to STT_STREAM (/technologies/dictate -> /technologies/stt/stream)
    (for backward compatibility, the /technologies/dictate endpoint is internally redirected)
  • New: SID/SID4 stream now allows gradually getting voiceprint from the stream (see /technologies/speakerid4/stream/voiceprint)
  • New: Unicode characters in file names are now supported on Windows platform
  • New: Added LLR score to GID result (as score_llr value, see /technologies/genderid)
  • New: Added ‘per_channel‘ parameter to Diarization for processing multi-channel recordings
  • New: Added configuration option to not start SPE if some technology doesn’t start (server.require_all_configured_technologies)
  • Fixed: Random SIGSEGV crashes in CS_CZ_5 STT
  • Fixed: KWS CS_CZ_5 ingnores keyword thresholds
  • Fixed: Duplicated output from KWS
  • Fixed: KWS online configurations for models CS_CZ_5 and NL_NL_5
  • Fixed: phxadmin increases number of instances in configuration instead of setting it
  • Fixed: phxclient is streaming slower than expected
  • Fixed: Redefinition of block in used configuration causes segmentation faults

NOTE: Due to the change in GID results content, all GID results will be removed from cache (database) during update!