Skip to contentSkip to main navigation Skip to footer

Releases and Changelogs (SPE)

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI.
SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x).

Releases

VersionRelease DateEnd of SupportMaintained UntilRelease type
3.452021-10-062023-05-013.50Public
3.422021-08-242021-09-303.45Feature
3.412021-07-152021-09-303.42Feature
3.402021-03-262022-10-013.45Public
3.382021-02-252021-03-303.40Feature
3.372021-02-172021-03-303.38Feature
3.362020-12-012021-03-303.37Feature
3.352020-10-012022-05-013.40Public
3.322020-08-282020-09-303.35Feature
3.312020-07-012020-09-303.32Feature
3.302020-03-272022-04-013.35Public
3.262020-03-022022-04-013.30Feature
3.252020-01-312022-04-013.26Feature
3.242020-12-182022-04-013.25Feature
3.232020-11-012022-04-013.24Feature
3.182019-10-012022-04-013.19Public
3.172019-06-282021-12-283.18Public
3.162019-04-262021-10-263.17Public
3.152019-02-282021-08-283.16Public
3.142018-12-212020-06-213.15Public
3.132018-11-192020-05-193.14Public
3.122018-08-172020-02-173.13Public
3.112018-03-152019-09-153.12Public
3.102017-12-062019-06-063.11Public
3.92017-09-082019-03-083.10Public
3.82017-06-262018-12-263.9Public
3.72017-03-272018-09-273.8Public
3.62016-12-142018-06-143.7Public
3.52016-10-042018-04-043.6Public
3.42016-09-192018-03-193.5Public
3.32016-07-112018-02-113.4Public
3.22016-04-222017-10-223.3Public
3.12016-02-152017-08-153.2Public
3.02016-02-092017-08-093.1Public
2.12015-09-162017-09-162017-09-16Public
2.02015-01-062016-07-062.1Public

Changelogs

Speech Engine 3.45.0, DB v1800, BSAPI 3.45.0 (2021-10-06) (Public release)

  • New: Added 6th generation of EN_US and EN_US_A STT (KWS/PHNREC will be added in one of the upcoming updates)
  • New: Added XL4 model for GID (for compatibility with SID4 XL4 voiceprints)
  • New: STT preferred phrases v2 with ability to dynamically add words to language model (currently in CS_CZ_6 only)
  • New: Endpoint /technologies/speakerid/clustervpset for clustering voiceprint set
  • New: Input streams over WebSocket (see GET /input_stream/websocket)
  • New: SQE: Added enable_pesq switch for Perceptual Evaluation of Speech Quality (PESQ) score estimation (PESQ is turned off by default for performance reasons)
  • Fixed: Empty “info” in VAD result when recording contains 0 seconds of speech for model GENERIC_3
  • Fixed: Incorrect timestamps in PHNREC results
  • Fixed: Segmentation fault when dynamically changing preferred phrases with new STT decoder (new decoder is currently used only in CS_CZ_6)
  • Fixed: Word separator is considered an invalid grapheme for CZ models in LMC
  • Improved: RLS-related messages are now logged at “debug” level, not “trace” level
  • Changed: STT language model customization marked as BETA
  • Removed: 4th generation of STT/KWS/PHNREC model for HR_HR
  • + all changes included in Feature Preview releases 3.41 and 3.42 (see below)

Speech Engine 3.42.0, DB v1701, BSAPI 3.42.1 (2021-08-24) (Feature Preview release)

  • New: Added /doc endpoint for serving REST API documentation in HTML format
  • New: New VAD model GENERIC_3 with improved accuracy + new VAD for 6th generation of CS_CZ STT, KWS and PHNREC
  • New: Added 6th generation of VI_VN STT, KWS and PHNREC
  • Fixed: New decoder does not propagate error messages
  • Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document for 6th generation of VI_VN
  • Improved: Updated decoder in 6th generation of CS_CZ STT, which should slightly increase recognition precision

Known issues:

  • When using preferred phrases containing some of the entities words with 6th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored
  • New VAD model GENERIC_3 does not work in VAD_STREAM technology

Speech Engine 3.40.8, DB v1701, BSAPI 3.40.5 (2021-08-18) (Public release)

  • Improved: Better audio resampler in player (/utils/player/output_stream) and TTS (/external/technologies/tts/*) for better audio quality output
  • Fixed: phxadmin2 error when disabling technology and specifying technology name twice
  • Fixed: Language name is truncated in LID result when name contains space character
  • Fixed: Fixes and improvements in numeric grammar for STT SK_SK_5 (words not converted to numbers in various cases)

Speech Engine 3.41.0, DB v1701, BSAPI 3.41.0 (2021-07-15) (Feature Preview release)

  • New: STT language model customization (LMC) via REST API (see Usage examples -> Speech To Text -> Create customized model in API documentation)
    NOTE: customized model is placed to shared directory, see more info in the SPE directories article.
  • New: Request ID can be specified in HTTP header X-Request-ID
  • New: Possibility to set source port for output stream
  • New: Added SQE technology on stream
  • New: Added Perceptual Evaluation of Speech Quality (PESQ) score estimation to SQE results
  • New: Following entities are transcribed more accurately in 6th generation of CS_CZ STT
    • male/female first name and surname
    • municipality
    • street
  • Fixed: LMC may use wrong paths on Windows platform
  • Improved: Removed + symbol from LMC phrases in STT output
  • Improved: Updated decoder in 6th generation of CS_CZ STT, which should slightly increase recognition precision

Known issue: When using preferred phrases containing some of the entities words with 6th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored.

Speech Engine 3.40.7, DB v1701, BSAPI 3.40.4 (2021-06-30) (Public release)

  • Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.35.9, DB v1602, BSAPI 3.35.5 (2021-06-30) (Public release)

  • Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.40.6, DB v1701, BSAPI 3.40.4 (2021-06-22) (Public release)

  • Fixed: Getting information about the language model containing the LPA caused an internal server error
  • Fixed: Acapela connector works again (was broken in 3.40.4)
  • Fixed: Fixes from 3.35.8 (MySQL database schema update required)

Speech Engine 3.35.8, DB v1602, BSAPI 3.35.5 (2021-06-21) (Public release)

  • Fixed: Race condition in speaker models may lead to inconsistency in database, causing e.g. “Extraction error: value already extracted” exception (MySQL database schema update required)
  • Fixed: Prevent creating a duplicate speaker model (or calibration set, audio source profile) with a different letter case in the name

Speech Engine 3.40.5, DB v1700, BSAPI 3.40.4 (2021-05-09) (Public release)

  • Fixed: When trying to register webhook over existing webhook for any stream technology, SPE returns HTTP 400 (1069) error instead of HTTP 500
  • Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.35.7, DB v1601, BSAPI 3.35.5 (2021-05-09) (Public release)

  • Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.40.4, DB v1700, BSAPI 3.40.4 (2021-05-28) (Public release)

  • Fixed: BSAPI 3.40.3 does not include fixes from 3.40.2
  • Fixed: Different results in LID L4 for waveform and languageprint input
  • Fixed: Requested segment is out of waveform range error in TAE
  • Fixed: End time may be before start time in STT “one best” transcription
  • Fixed: When creating a new LID language pack, hash of the file contained in the custom language pack report is incorrectly calculated (occurs mainly in Windows)
  • Fixed: Items builtin_language_models and custom_language_models in a body of POST /technologies/languageid/languagepacks/{name} are now optional. At least one of them must not be empty.
  • Fixed: Better server response message when language model was not found during creation of new LID language pack
  • Fixed: Minor bugs in licensing subsystem

Speech Engine 3.40.3, DB v1700, BSAPI 3.40.3 (2021-05-12) (Public release)

  • New: Added 6th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE of STT, KWS and PHNREC with improved accuracy
  • Fixed: Various log and error messages fixed
  • Fixed: Acapela TTS connector puts incorrectly named item languages in output JSON
  • Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document with phonemes for 6th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE

Speech Engine 3.40.2, DB v1700, BSAPI 3.40.2 (2021-04-30) (Public release)

  • Fixed: LMC does not work with CS_CZ_6 online (stream) configuration
  • Fixed: Sample rate in Opus files is incorrect
  • Fixed: Various “[ERRFMT]” log messages fixes

Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16) (Public release)

  • Fixed: 6th generation STT/KWS stream result may start with words from end of previous stream
  • Fixed: Some licensing error messages are not shown in log
  • Fixed: Missing file names in log messages in SID and SID4 tasks
  • Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used
  • Fixed: phxdamin2 cannot configure VAD_STREAM technology
  • Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf

Speech Engine 3.40.0, DB v1700, BSAPI 3.40.0 (2021-03-26) (Public release)

  • New: Added 6th generation of CS_CZ  of STT, KWS and PHNREC with improved accuracy
  • Changed: Using new licensing system under the hood (internal change)
    • NOTE: When using SPE with FLS (Floating License Server), you need to upgrade FLS to version 2.x in order to be able to use SPE 3.40+ with FLS.
  • + all changes included in Feature Preview releases 3.36, 3.37 and 3.38 (see below)

Known bug: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used. There is no problem when using JSON as input.

Speech Engine 3.35.6, DB v1601, BSAPI 3.35.5 (2021-03-24) (Public release)

  • Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.30.14, DB v1401, BSAPI 3.30.14 (2021-03-24) (Public release)

  • Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.38.0, DB v1700, BSAPI 3.38.0 (2021-02-25) (Feature Preview release)

  • New: Training of LID Language Packs (no more need for command line tools… finally!)
  • New: LID Language Packs allow to store meta-files
  • New: New entity “LID Language Model” (equivalent of *.lpa LanguagePrint Archive)
  • Improved: Updated STT model RU_RU_A to version 4.6.0 of (updated language model)
  • Removed: Support for RLS-enforced licences in command line applications
  • Removed: FeaturePasterRepeat warning on null/empty repeat vector

Speech Engine 3.35.5, DB v1601, BSAPI 3.35.4 (2021-02-22) (Public release)

  • Fixed: Creation of SID4 audio source profile fails if path parameter is empty
  • Improved: Better log message when switching to webhook
  • Improved: Debug log level now shows task start and finish messages

Speech Engine 3.37.1, DB v1601, BSAPI 3.37.0 (2021-02-18) (Feature Preview release)

  • Fixed: Missing phxadmin2 tool in the Windows package

Speech Engine 3.37.0, DB v1601, BSAPI 3.37.0 (2021-02-17) (Feature Preview release)

  • New: New administration tool phxadmin2, allowing to perform phxadmin actions non-interactively, e.g. from scripts
  • New: Added 5th generation of PS (Pashto) of STT, KWS and PHNREC
  • Fixed: Internal subsystems are uninitialized in reverse order than it should be
  • Fixed: Creation of SID4 audio source profile fails if path parameter is empty
  • Improved: Better log message when switching to webhook
  • Improved: Debug log level now shows task start and finish messages

Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14) (Public release)

  • Fixed: STT/KWS model AR_XL_5 has incorrect name and does not start
  • Fixed: Missing KWS model AR_XL_5
  • Fixed: Processing of some short recordings causes TwoGmmCalibThreshold is not finite error
  • Fixed: STT preferred phrases “out of vocabulary” (OOV) warning message is now more verbose

Speech Engine 3.36.0, DB v1601, BSAPI 3.35.3 (2020-12-01) (Feature Preview release)

  • New: Added some useful information to log messages:
    • Stream ID in task-related log messages
    • Audio length in debug log messages
    • Workers and streams info in debug log messages
  • New: Possibility to obtain information about input RTP connection (see GET /input_stream/rtp/info)
  • New: Endpoint to get languageprint information (see POST /technologies/languageid/lpinfo)
  • Improved: Result of languageprint extraction now contains speech length for each languageprint (see GET /technologies/languageid/extractlp)
  • Improved: Output RTP packet payload size changed from 480 to 160 bytes
  • Fixed: SSRC in output RTP packet is now set to random 32-bit value
  • Fixed: RTP packets with payload type >=95 in input RTP streams are now ignored