Releases and Changelogs (SPE)

Speech Engine (SPE) is developed as RESTful API on top of Phonexia BSAPI.

Releases

Version	Release Date	End of Support	Maintained Until	Release type
3.60	2023-12-05	2025-06-01	n/a	Public
3.59	2023-06-20	2025-01-01	n/a	Public
3.58	2023-04-03	2024-10-01	n/a	Public
3.57	2023-02-01	2024-08-01	n/a	Public
3.56	2022-12-15	2024-06-01	n/a	Public
3.55	2022-10-03	2024-04-01	3.60	Public
3.52	2022-07-01	2022-09-30	3.55	Feature
3.51	2022-06-14	2022-09-30	3.55	Feature
3.50	2022-03-23	2023-10-01	3.55	Public
3.46	2022-02-07	2022-04-01	3.50	Feature
3.45	2021-10-06	2023-05-01	3.50	Public
3.42	2021-08-24	2021-09-30	3.45	Feature
3.41	2021-07-15	2021-09-30	3.42	Feature
3.40	2021-03-26	2022-10-01	3.45	Public
3.38	2021-02-25	2021-03-30	3.40	Feature
3.37	2021-02-17	2021-03-30	3.38	Feature
3.36	2020-12-01	2021-03-30	3.37	Feature
3.35	2020-10-01	2022-05-01	3.40	Public
3.32	2020-08-28	2020-09-30	3.35	Feature
3.31	2020-07-01	2020-09-30	3.32	Feature
3.30	2020-03-27	2022-04-01	3.35	Public
3.26	2020-03-02	2022-04-01	3.30	Feature
3.25	2020-01-31	2022-04-01	3.26	Feature
3.24	2020-12-18	2022-04-01	3.25	Feature
3.23	2020-11-01	2022-04-01	3.24	Feature
3.18	2019-10-01	2022-04-01	3.19	Public
3.17	2019-06-28	2021-12-28	3.18	Public
3.16	2019-04-26	2021-10-26	3.17	Public
3.15	2019-02-28	2021-08-28	3.16	Public
3.14	2018-12-21	2020-06-21	3.15	Public
3.13	2018-11-19	2020-05-19	3.14	Public
3.12	2018-08-17	2020-02-17	3.13	Public
3.11	2018-03-15	2019-09-15	3.12	Public
3.10	2017-12-06	2019-06-06	3.11	Public
3.9	2017-09-08	2019-03-08	3.10	Public
3.8	2017-06-26	2018-12-26	3.9	Public
3.7	2017-03-27	2018-09-27	3.8	Public
3.6	2016-12-14	2018-06-14	3.7	Public
3.5	2016-10-04	2018-04-04	3.6	Public
3.4	2016-09-19	2018-03-19	3.5	Public
3.3	2016-07-11	2018-02-11	3.4	Public
3.2	2016-04-22	2017-10-22	3.3	Public
3.1	2016-02-15	2017-08-15	3.2	Public
3.0	2016-02-09	2017-08-09	3.1	Public
2.1	2015-09-16	2017-09-16	2017-09-16	Public
2.0	2015-01-06	2016-07-06	2.1	Public

Changelogs

Speech Engine 3.61 (Public release)

Speech Engine 3.61.0, DB v1901, BSAPI 3.61.0 (2024-03-27)

New: Added endpoints /server/restart and /server/shutdown to restart/shutdown the server
New: Added endpoint POST /technologies to configure technologies on server
New: Added show_all parameter to GET /technologies endpoint, to return all technologies available on the server, i.e. including those not currently running
New: Added possibility to set score thresholds when calling GET/POST /technologies/speakerid4 (useful to limit the number of returned results when using large speaker groups)
New: Added base docker image without any technology (phonexia/spe:3.61.0). Useful as a base image. Doesn’t work on its own.
Fixed: Incorrect timestamps in multichannel diarization results
Fixed: Different/incorrect output in STT for empty streams (all 6. gen models)
Fixed: Time Analysis Extraction does not return correct total length of audio
Fixed: Time Analysis Extraction returns crosstalks even for channel that is reported to not contain speech
Deprecated: Legacy Speaker Identification technology, i.e. all enpdoints under /technologies/speakerid

Speech Engine 3.60 (Public release)

Speech Engine 3.60.1, DB v1901, BSAPI 3.60.1 (2024-01-17)

Fixed: Docker image phonexia/spe:3.60-stt-en_us_6 does not start
Fixed: Phoneme Recognizer does not work in STT EN_US_A model
Fixed: Reading wave files in ieee_float format with out-of-range sample values, causing e.g. invalid voiceprints or “null” comparison scores

Speech Engine 3.60.0, DB v1901, BSAPI 3.60.0 (2023-12-05)

New: Added floating_window parameter to /technologies/speakerid4/input_stream and /technologies/speakerid4/input_stream/voiceprint endpoints, for using only last N seconds of speech from input stream
New: Added HU_HU_6 model for STT and KWS
New: All 6^th generation STT packages now include also Phoneme Recognizer (previously included only with KWS), which can be useful for finetuning pronunciations for preferred phrases or for adding new/unknown words to model
Improved: Added stream configuration for VAD SID4_XL5 model introduced in version 3.58.0
Improved: Automatic detection of the number of workers now starts at least 1 worker, so that non-technology endpoints can work even if no technology is started
Fixed: Random Records response could not be parsed errors after receiving reply from RLS (RLS v0.14.2 is required to fix the issue)
Fixed: Websocket connection error may lead to accessing dealocated memory
Fixed: Deleting tasks can leave files locked forever
Fixed: Files are locked forever when task limit is reached
Fixed: Misleading and confusing licensing error when HW profile does not match
Fixed: Trace log messages during model initialization say “Unknown interface”
Fixed: STT_STREAM fails to process request with preferred phrases when no instance of STT with the same model is running
Fixed: STT: Different/incorrect output for empty streams (fixed only in CS_CZ_6 and SK_SK_6 for now)
Fixed: STT: Space is not allowed as separator in wordlist file in CLI interface
Fixed: STT: Still incorrect timestamp values in N-best output of stream transcription
Fixed: STT: Extra “+” character shown in Confusion Network output
Fixed: STT: A “+” character gets removed from wordlist backup file in customized STT model

Speech Engine 3.59 (Public release)

Speech Engine 3.59.0, DB v1901, BSAPI 3.59.0 (2023-06-20)

New: PESQ estimation in SQE now considers only voice segments (via newly added VAD)
New: Grammar rules for words-to-numbers conversion added to ES, EN_US, EN_US_A and PL_PL 6^th generation STT models
New: Empty grammar rules for words-to-numbers conversion added to all remaining 6^th generation STT models
New: Added changelogs for STT models (located in bsapi/stt/data/models_<model> directory)
Improved: Updated language models of 6^th generation STT
- RU_RU_6 (version 6.1.0)
- RU_RU_A_6 (version 6.1.0)
Fixed: *.bs.usr configuration files are ignored by phxadmin and phxadmin2
Fixed: ❗❗❗ Unable to open audio files with non-ASCII/Unicode characters in the name or path on Windows

Speech Engine 3.58 (Public release)

Speech Engine 3.58.0, DB v1901, BSAPI 3.58.0 (2023-04-03)

New: Added 6^th generation models for STT and KWS
- UK_UA_6 (Ukrainian)
- SR_RS_6 (Serbian)
New: Added VAD SID4_XL5 model, which detects speech/non-speech exactly the same way as SID4 XL5 model does
Improved: Reduced memory consumption by tuning ONNX runtime parameters. Mostly visible when processing (many) long recordings in STT.
Improved: SID4 comparator now returns actual score for recordings containing less than 3 seconds of speech (it returned score -9999 in versions 3.55 to 3.57)
NOTE: It is still strongly recommended to NOT rely on results from such very short recordings. Only longer recordings give results with appropriate confidence.
Improved: Better Windows version detection
Fixed: Generic model for SQE may fail when processing very short recordings
Fixed: Unable to initialize technologies when SPE is launched using UNC path on Windows

Speech Engine 3.57 (Public release)

Speech Engine 3.57.0, DB v1901, BSAPI 3.57.0 (2023-02-01)

New: AGE XL4 and XL5 models (for compatibility with SID4 XL4 and XL5 voiceprints)

Speech Engine 3.56 (Public release)

Speech Engine 3.56.0, DB v1901, BSAPI 3.56.0 (2022-12-15)

New: GID XL5 model (for compatibility with SID4 XL5 voiceprints)
Fixed: Incorrect timestamp values in STT N-best results of stream transcription
Fixed: Training of LID may get stuck in infinite loop in some cases

Speech Engine 3.55 (Public release)

Speech Engine 3.55.1, DB v1901, BSAPI 3.55.1 (2022-11-09)

New: Added 6^th generation models for STT and KWS
- KK_KZ_6 (Kazakh)
- BN_6 (Bengali)
Fixed: phxcmd lpextract with -archive parameter and input directory creates empty archive
Fixed: Misleading error message if a HW license with an invalid HW profile is supplied

Speech Engine 3.55.0, DB v1901, BSAPI 3.55.0 (2022-10-03)

New: Added SID4 model XL5 with optional backwards compatibility with model XL4
New: Added phxcmd[.exe] command line interface for technologies
New: Added 6^th generation models for STT and KWS
- DE_DE_6
- NL_6
- KA_GE_6
Improved: Updated language models of 6^th generation STT
- CS_CZ_6 (version 6.5.2)
- SK_SK_6 (version 6.1.1)
Fixed: Cannot create Audio Source Profile in some circumstances when using only user calibration
Fixed: Possible deadlock in TTS output stream
Fixed: Corrupt audio file will crash the VAD technology and cause an Unexpected exception: null pointer error for further processing
Fixed: Errors like Could not parse word class definitions due to unnecessarily required write permissions for certain files
Removed: Removed old SID model S
+ all changes included in Feature Preview releases 3.51 and 3.52 (see below)

NOTE: Windows 7 / Windows Server 2008 R2 and older are no longer supported!

Speech Engine 3.52

Speech Engine 3.52.0, DB v1901, BSAPI 3.52.0 (2022-07-01)

New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
- IT_IT_6
- RU_RU_6
Fixed: Significantly improved speed of diarization model XL4 on longer audios
Removed: Removed old STT/KWS model for IT_IT

Speech Engine 3.51

Speech Engine 3.51.0, DB v1901, BSAPI 3.51.0 (2022-06-14)

New: Added option to set number of workers and RTP input streams limit automatically
New: Added “Settings Adviser” feature, which checks technologies and workers configuration against current CPU and eventually suggests (in log messages) changes for optimal performance
New: Added 6^th generation of ZH_CN STT, KWS and PHNREC (with features of CS_CZ_6: new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
Fixed: Use even ports for input RTP streams (according to RFC 3550) instead of odd ports
Fixed: Wrong detection of OS version on newer Windows
Changed: Windows version now requires Universal C Runtime (UCRT) installed (normally installed as part of Windows Update)
Changed: Logging is set to both file and console by default (this might be undesired for running SPE as service, i.e. changing the logging setting might be wise)
Changed: Number of workers and RTP input streams limit are set automatically by default
Changed: Require initialization of all configured technologies by default
Removed: Dropped support for Windows 7 / Windows Server 2008 R2 and older
Removed: Removed 4^th generation of STT/KWS model for PL_PL

Speech Engine 3.50 (Public release)

Speech Engine 3.50.7, DB v1901, BSAPI 3.50.6 (2022-08-25)

Fixed: Processing of audio files bigger than 1 GiB stops prematurely with empty result

Speech Engine 3.50.6, DB v1901, BSAPI 3.50.5 (2022-07-29)

Fixed: SPE now gracefully stops in Phonexia’s pre-built docker images (see https://hub.docker.com/r/phonexia/spe)

Speech Engine 3.50.5, DB v1901, BSAPI 3.50.5 (2022-07-11)

Fixed: Timestamps in stream STT may not be in chronological order, resulting in decreased accuracy

Speech Engine 3.50.4, DB v1901, BSAPI 3.50.4 (2022-05-10)

Fixed: Processing slowdown and high CPU usage on Windows platform – the following technologies/models used multiple threads when they should not:
STT/KWS – all 6^th generation models
LID – L4 model
DIAR, GID, SID4 – XL4 model
SQE – GENERIC model
VAD – GENERIC3 model

Speech Engine 3.50.3, DB v1901, BSAPI 3.50.3 (2022-04-28)

New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
- PL_PL_6
- AR_KW_6

Speech Engine 3.50.2, DB v1901, BSAPI 3.50.2 (2022-04-15)

Fixed: Licensing subsystem fails to get license when multiple applications run under different OS user accounts

Speech Engine 3.50.1, DB v1901, BSAPI 3.50.1 (2022-04-05)

Fixed: Wrong pronunciations of some foreign words in STT model CS_CZ_6
Fixed: SPE may respond with STT result version 5 instead of version 6 if result was found in cache
Fixed: Hardware profile file contains extra NULL character

Due to change in STT results content, all STT results are removed from cache (database) during update!

Speech Engine 3.50.0, DB v1900, BSAPI 3.50.0 (2022-03-23)

New: Support for word classes in STT preferred phrases (currently in CS_CZ_6 only)
New: phxadmin2 was promoted from BETA to production
New: phxadmin2 automatically renames renamed technologies or models, and shows information about renamed technologies or models
Improved: Updated VAD GENERIC_3 model
Improved: Updated following 6^th generation models for STT and KWS (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
- VI_VN_6
- FR_FR_6
- CS_CZ_6 (updated VAD, tuned LM)
- ES_6 (STT only)
- EN_US_A_6 (STT only)
Fixed: Bad example voiceprints in SID4 L4 model
Fixed: STT grapheme checking inconsistent behavior
Fixed: STT NL_NL_5 (possibly also other 5^th gen models) is much slower in 3.45 than in 3.30
Fixed: Missing more specific error messages when opening file fails on Windows platform
Removed: Old S and L models for AGE technology (now only XL and L4 are supported)
Removed: Old S and L models for LID technology (now only L3, XL3 and L4 are supported)
+ all changes included in Feature Preview release 3.46 (see below)

Speech Engine 3.46

Speech Engine 3.46.0, DB v1900, BSAPI 3.46.0 (2022-02-07)

New: STT: Single word in preferred phrase is now preferred in any sentence context (previously it was preferred only in single-word utterances)
New: STT: Faster initialization of “dynamic adding of words” feature (now ~0.1 s, before ~0.5 s)
New: phxadmin2 added to installation manual
New: Models for STT and KWS with features of CS_CZ_6 (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
- TR_TR_6
- SK_SK_6
- FA_6
Improved: Models for STT and KWS, updated and aligned with CS_CZ_6 features (new VAD generation, dynamic adding of words in preferred phrases, increased transcription precision via updated decoder)
- AR_XL_6
- SV_SE_6
- HR_HR_6
- EN_US_6
- PS_6
Fixed: Dynamic adding of words in STT does not support UTF-8 characters wider than 16 bit
Fixed: STT crashes on zero probability in LM
Fixed: Invalid products in the license file are skipped instead of causing an error
Changed: MySQL database is no longer supported in favor of MariaDB (more details about migration in doc/UPDATE.txt)
Changed: Better phxadmin2 usage descriptions
Changed: phxadmin2 --version output format (3.45.0, 6118, 2021-10-07)
Changed: phxadmin2 technology show now uses colon to separate technology and model (consistent with “technology enable”)
Removed: user setting for maximum number of pending requests (the setting was ignored anyway)

Speech Engine 3.45 (Public release)

Speech Engine 3.45.7, DB v1701, BSAPI 3.45.8 (2022-05-06)

Fixed: Processing slowdown and high CPU usage on Windows platform – the following technologies/models used multiple threads when they should not:
STT/KWS – all 6^th generation models
LID – L4 model
DIAR, GID, SID4 – XL4 model
SQE – GENERIC model
VAD – GENERIC3 model

Speech Engine 3.45.6, DB v1701, BSAPI 3.45.7 (2022-04-14)

Fixed: Licensing subsystem fails to get license when multiple applications run under different OS user accounts

Speech Engine 3.45.5, DB v1701, BSAPI 3.45.6 (2022-02-22)

Fixed: Some STT models may fail to initialize with BsapiException: SPhxBasicDecoderI(2): <unspecified file>: cannot open file error
Fixed: Exception during license acquisition: License system failure (1303) on Windows with NET or FLS-distributed licenses when license expiration was in year 2038 or later

Speech Engine 3.45.4, DB v1800, BSAPI 3.45.5 (2022-01-14)

❗❗❗ STT users are strongly encouraged to update ❗❗❗

Fixed: Gradual speed drop and memory leak in STT
Fixed: Words-to-numbers conversion significantly decreases STT performance

Speech Engine 3.45.3, DB v1800, BSAPI 3.45.4 (2021-12-08)

Fixed: KWS fails with “Can not create temporary file” exception

Speech Engine 3.45.2, DB v1800, BSAPI 3.45.3 (2021-11-29)

Fixed: LMC does not work with 6^th generation of Czech and Spanish models
Fixed: More specific error messages when opening file fails
Changed: Spanish 6^th generation of STT/KWS renamed from ES_ES_6 to ES_6
(incorrect name was used in SPE 3.45.1)

Speech Engine 3.45.1, DB v1800, BSAPI 3.45.1 (2021-11-22)

New: Added 6^th generation of EN_US KWS/PHNREC
New: Added 6^th generation of ES_ES STT, KWS and PHNREC
Fixed: Memory leak in STT CS_CZ_6 model
Fixed: STT CS_CZ_6 with preferred phrases reports class words as OOV (out of vocabulary)
Fixed: STT returns error on models which don’t support preferred phrases even if phrases were not specified
Fixed: STT slowdown if “words to be added to language model” are not specified
Fixed: KWS sometimes saves keyword list with minus infinity log probability
Fixed: PESQ score in SQE is not always in range <-0.5, 4.5>
Fixed: DIAR XL4 incorrectly detects various technical signals and noises as a speaker
Fixed: TTS “info” output lists voices twice
Fixed: Output stream does not accept “localhost” as destination address on some OSs

Speech Engine 3.45.0, DB v1800, BSAPI 3.45.0 (2021-10-06)

New: Added 6^th generation of EN_US and EN_US_A STT (KWS/PHNREC will be added in one of the upcoming updates)
New: Added XL4 model for GID (for compatibility with SID4 XL4 voiceprints)
New: STT preferred phrases v2 with ability to dynamically add words to language model (currently in CS_CZ_6 only)
New: Endpoint /technologies/speakerid/clustervpset for clustering voiceprint set
New: Input streams over WebSocket (see GET /input_stream/websocket)
New: SQE: Added enable_pesq switch for Perceptual Evaluation of Speech Quality (PESQ) score estimation (PESQ is turned off by default for performance reasons)
Fixed: Empty “info” in VAD result when recording contains 0 seconds of speech for model GENERIC_3
Fixed: Incorrect timestamps in PHNREC results
Fixed: Segmentation fault when dynamically changing preferred phrases with new STT decoder (new decoder is currently used only in CS_CZ_6)
Fixed: Word separator is considered an invalid grapheme for CZ models in LMC
Improved: RLS-related messages are now logged at “debug” level, not “trace” level
Changed: STT language model customization marked as BETA
Removed: 4^th generation of STT/KWS/PHNREC model for HR_HR
+ all changes included in Feature Preview releases 3.41 and 3.42 (see below)

Speech Engine 3.42

Speech Engine 3.42.0, DB v1701, BSAPI 3.42.1 (2021-08-24)

New: Added /doc endpoint for serving REST API documentation in HTML format
New: New VAD model GENERIC_3 with improved accuracy + new VAD for 6^th generation of CS_CZ STT, KWS and PHNREC
New: Added 6^th generation of VI_VN STT, KWS and PHNREC
Fixed: New decoder does not propagate error messages
Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document for 6^th generation of VI_VN
Improved: Updated decoder in 6^th generation of CS_CZ STT, which should slightly increase recognition precision

Known issues:

When using preferred phrases containing some of the class words with 6^th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored
New VAD model GENERIC_3 does not work in VAD_STREAM technology

Speech Engine 3.41

Speech Engine 3.41.0, DB v1701, BSAPI 3.41.0 (2021-07-15)

New: STT language model customization (LMC) via REST API (see Usage examples -> Speech To Text -> Create customized model in API documentation)
NOTE: customized model is placed to shared directory, see more info in the SPE directories article.
New: Request ID can be specified in HTTP header X-Request-ID
New: Possibility to set source port for output stream
New: Added SQE technology on stream
New: Added Perceptual Evaluation of Speech Quality (PESQ) score estimation to SQE results
New: Following word classes are transcribed more accurately in 6^th generation of CS_CZ STT
- male/female first name and surname
- municipality
- street
Fixed: LMC may use wrong paths on Windows platform
Improved: Removed + symbol from LMC phrases in STT output
Improved: Updated decoder in 6^th generation of CS_CZ STT, which should slightly increase recognition precision

Known issue: When using preferred phrases containing some of the class words with 6^th generation of CS_CZ STT, these words are reported as “out of vocabulary” and the phrase is ignored.

Speech Engine 3.40 (Public release)

Speech Engine 3.40.10, DB v1701, BSAPI 3.40.11 (2022-02-21)

Fixed: Exception during license acquisition: License system failure (1303) on Windows with NET or FLS-distributed licenses when license expiration was in year 2038 or later

Speech Engine 3.40.9, DB v1701, BSAPI 3.40.10 (2022-01-13)

❗❗❗ STT users are strongly encouraged to update ❗❗❗

Fixed: Voices in TTS info output were listed twice
Fixed: Gradual speed drop and memory leak in STT
Fixed: Words-to-numbers conversion significantly decreases STT performance

Speech Engine 3.40.8, DB v1701, BSAPI 3.40.5 (2021-08-18)

Improved: Better audio resampler in player (/utils/player/output_stream) and TTS (/external/technologies/tts/*) for better audio quality output
Fixed: phxadmin2 error when disabling technology and specifying technology name twice
Fixed: Language name is truncated in LID result when name contains space character
Fixed: Fixes and improvements in numeric grammar for STT SK_SK_5 (words not converted to numbers in various cases)

Speech Engine 3.40.7, DB v1701, BSAPI 3.40.4 (2021-06-30)

Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.40.6, DB v1701, BSAPI 3.40.4 (2021-06-22)

Fixed: Getting information about the language model containing the LPA caused an internal server error
Fixed: Acapela connector works again (was broken in 3.40.4)
Fixed: Fixes from 3.35.8 (MySQL database schema update required)

Speech Engine 3.40.5, DB v1700, BSAPI 3.40.4 (2021-05-09)

Fixed: When trying to register webhook over existing webhook for any stream technology, SPE returns HTTP 400 (1069) error instead of HTTP 500
Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.40.4, DB v1700, BSAPI 3.40.4 (2021-05-28)

Fixed: BSAPI 3.40.3 does not include fixes from 3.40.2
Fixed: Different results in LID L4 for waveform and languageprint input
Fixed: Requested segment is out of waveform range error in TAE
Fixed: End time may be before start time in STT “one best” transcription
Fixed: When creating a new LID language pack, hash of the file contained in the custom language pack report is incorrectly calculated (occurs mainly in Windows)
Fixed: Items builtin_language_models and custom_language_models in a body of POST /technologies/languageid/languagepacks/{name} are now optional. At least one of them must not be empty.
Fixed: Better server response message when language model was not found during creation of new LID language pack
Fixed: Minor bugs in licensing subsystem

Speech Engine 3.40.3, DB v1700, BSAPI 3.40.3 (2021-05-12)

New: Added 6^th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE of STT, KWS and PHNREC with improved accuracy
Fixed: Various log and error messages fixed
Fixed: Acapela TTS connector puts incorrectly named item languages in output JSON
Improved: Updated doc/Phonemes_for_STT_and_KWS.pdf document with phonemes for 6^th generation of HR_HR, FR_FR, PS, AR_XL and SV_SE

Speech Engine 3.40.2, DB v1700, BSAPI 3.40.2 (2021-04-30)

Fixed: LMC does not work with CS_CZ_6 online (stream) configuration
Fixed: Sample rate in Opus files is incorrect
Fixed: Various “[ERRFMT]” log messages fixes

Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16)

Fixed: 6^th generation STT/KWS stream result may start with words from end of previous stream
Fixed: Some licensing error messages are not shown in log
Fixed: Missing file names in log messages in SID and SID4 tasks
Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used
Fixed: phxdamin2 cannot configure VAD_STREAM technology
Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf

Speech Engine 3.40.0, DB v1700, BSAPI 3.40.0 (2021-03-26)

New: Added 6^th generation of CS_CZ of STT, KWS and PHNREC with improved accuracy
Changed: Using new licensing system under the hood (internal change)
- NOTE: When using SPE with FLS (Floating License Server), you need to upgrade FLS to version 2.x in order to be able to use SPE 3.40+ with FLS.
+ all changes included in Feature Preview releases 3.36, 3.37 and 3.38 (see below)

Known bug: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used. There is no problem when using JSON as input.

Speech Engine 3.38

Speech Engine 3.38.0, DB v1700, BSAPI 3.38.0 (2021-02-25)

New: Training of LID Language Packs (no more need for command line tools… finally!)
New: LID Language Packs allow to store meta-files
New: New entity “LID Language Model” (equivalent of *.lpa LanguagePrint Archive)
Improved: Updated STT model RU_RU_A to version 4.6.0 of (updated language model)
Removed: Support for RLS-enforced licences in command line applications
Removed: FeaturePasterRepeat warning on null/empty repeat vector

Speech Engine 3.37

Speech Engine 3.37.1, DB v1601, BSAPI 3.37.0 (2021-02-18)

Fixed: Missing phxadmin2 tool in the Windows package

Speech Engine 3.37.0, DB v1601, BSAPI 3.37.0 (2021-02-17)

New: New administration tool phxadmin2, allowing to perform phxadmin actions non-interactively, e.g. from scripts
New: Added 5^th generation of PS (Pashto) of STT, KWS and PHNREC
Fixed: Internal subsystems are uninitialized in reverse order than it should be
Fixed: Creation of SID4 audio source profile fails if path parameter is empty
Improved: Better log message when switching to webhook
Improved: Debug log level now shows task start and finish messages

Speech Engine 3.36

Speech Engine 3.36.0, DB v1601, BSAPI 3.35.3 (2020-12-01)

New: Added some useful information to log messages:
- Stream ID in task-related log messages
- Audio length in debug log messages
- Workers and streams info in debug log messages
New: Possibility to obtain information about input RTP connection (see GET /input_stream/rtp/info)
New: Endpoint to get languageprint information (see POST /technologies/languageid/lpinfo)
Improved: Result of languageprint extraction now contains speech length for each languageprint (see GET /technologies/languageid/extractlp)
Improved: Output RTP packet payload size changed from 480 to 160 bytes
Fixed: SSRC in output RTP packet is now set to random 32-bit value
Fixed: RTP packets with payload type >=95 in input RTP streams are now ignored

Speech Engine 3.35 (Public release)

Speech Engine 3.35.9, DB v1602, BSAPI 3.35.5 (2021-06-30)

Fixed: Invalid SQL statement on update of SPE – fixed SQLite update script from v1601 to v1602

Speech Engine 3.35.8, DB v1602, BSAPI 3.35.5 (2021-06-21)

Fixed: Race condition in speaker models may lead to inconsistency in database, causing e.g. “Extraction error: value already extracted” exception (MySQL database schema update required)
Fixed: Prevent creating a duplicate speaker model (or calibration set, audio source profile) with a different letter case in the name

Speech Engine 3.35.7, DB v1601, BSAPI 3.35.5 (2021-05-09)

Fixed: Invalid SQL syntax when overwriting voiceprint in a database

Speech Engine 3.35.6, DB v1601, BSAPI 3.35.5 (2021-03-24)

Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.35.5, DB v1601, BSAPI 3.35.4 (2021-02-22)

Fixed: Creation of SID4 audio source profile fails if path parameter is empty
Improved: Better log message when switching to webhook
Improved: Debug log level now shows task start and finish messages

Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14)

Fixed: STT/KWS model AR_XL_5 has incorrect name and does not start
Fixed: Missing KWS model AR_XL_5
Fixed: Processing of some short recordings causes TwoGmmCalibThreshold is not finite error
Fixed: STT preferred phrases “out of vocabulary” (OOV) warning message is now more verbose

Speech Engine 3.35.3, DB v1601, BSAPI 3.35.3 (2020-11-24)

New: Internal support for SAMPA phonetic alphabet
New: Updated STT model RU_RU_A to version 4.5.0 of (updated language model)
New: Updated STT/KWS/PHNREC model AR_XL to version 5.2.0 (updated language model, changed phonemes notation to X-SAMPA)
Fixed: Cannot create new output stream due to hanging unfinished tasks
Fixed: Task is not removed from pool when result is delivered via Webhook
Fixed: Some log messages contain format placeholder instead of numbers
Fixed: Missing <silence/> label in STT confusion network output
Fixed: STT confusion network contains <silence/> tags with confidence greater than 1.0
Fixed: Diarization crashes during processing
Fixed: Diarization XL4 crashes on file with no speech
Fixed: SID voiceprint extraction on stream is affected by previous run
Fixed: Incorrect number of LID L4 languages in documentation
Improved: Database drop scripts
Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf

Speech Engine 3.35.2, DB v1600, BSAPI 3.35.2 (2020-10-22)

Fixed: detection of certain USB license tokens

Speech Engine 3.35.1, DB v1600, BSAPI 3.35.1 (2020-10-13)

Fixed: Missing input stream task name in log messages
Fixed: Missing arguments in “word not found” error messages (when using preferred phrases)
Changed: Configurable STT Confusion Network threshold min_word_posterior_probability changed from log probability to normal probability (i.e. the value visible in Confusion Network results)

Speech Engine 3.35.0, DB v1600, BSAPI 3.35.0 (2020-10-01)

New: LID model L4 was promoted to production (LID BETA_L4 renamed to LID L4)
New: Added new language tag documentation (doc/Technology_LID_L4_Language_tags.pdf)
New: Updated STT model CS_CZ_5 to version 5.2.1 (fixes faulty transcription of numbers into Roman format)
New: Added configurable STT Confusion Network threshold (in technology configuration file)
Fixed: STT didn’t work with 4^th and older generation models after introduction of the Preferred phrases feature in SPE 3.32
Fixed: Update from SPE 3.30 causes errors in STT result cache
Fixed: memory leak in logging system
Fixed: Typo in name of es-XA language in LID model L4 default language pack (es-XA7 -> es-XA)
Fixed: Time Analysis segfaults on audio with 3+ channels
Fixed: vpextract_s_calib.bs config file not working
Fixed: WebSocket reply to PING control frame does not follow the protocol specification
+ all changes included in Feature Preview releases 3.31 and 3.32 (see below)

NOTE: Due to the change in STT results content, all STT results will be removed from cache (database) during update!

Speech Engine 3.32

Speech Engine 3.32.0, DB v1500, BSAPI 3.32.0 (2020-08-28)

New: Added support for Webhooks and WebSockets in stream processing
New: Added support for preferred phrases in 5^th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream)
New: Added possibility to get multiple STT result types at once using single request (result_type query parameter now supports multiple values)
New: Added phrase start- and end times in STT “n-best” result
New: Added new Diarization model XL4
Fixed: Results of STT stream and SID/SID4 stream voiceprint do not contain task ID, stream ID and task execution time

Speech Engine 3.31

Speech Engine 3.31.2, DB v1500, BSAPI 3.31.0 (2020-08-17)

Fixed: MySQL session is not returned to the session pool if RELOAD privilege is not granted in the database, which leads to exhausting of all sessions and server subsequently stops working

Speech Engine 3.31.1, DB v1500, BSAPI 3.31.0 (2020-07-02)

Fixed: SQLite database update from version v1401 fails

Speech Engine 3.31.0, DB v1500, BSAPI 3.31.0 (2020-07-01)

New: SPE now requires CentOS 7 or other Linux based OS with glibc >= 2.17
New: Added instructions for updating SPE (see doc/UPDATE.txt file)
New: Added new LID model BETA_L4
New: Audio Source Profile can be now stored in SPE storage without the need for registration
Fixed: STT 5^th generation confusion network output contains extra legacy _SILENCE_ tokens with weird timestamps
Fixed: Stream ID missing in debug log record
Fixed: SID4 cannot use Audio Source Profile created with different number of calibration chunks
Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf
Removed: Removed VBS plugin
Removed: Following STT models are obsolete and not available and supported anymore:
CZ_PROMPT3, CZ_IT1, CZ_TELCO2, CZ2, CZ_ENERGY1, CZ_FIN1, SK_TELCO3, SK1, EN_L1, EN_N1, EN_GB2, EN_S1, ES_AMER1, FR1, RU_A7, RU7

Speech Engine 3.30 (Public release)

Speech Engine 3.30.14, DB v1401, BSAPI 3.30.14 (2021-03-24)

Fixed: One more issue in detection of certain USB license tokens

Speech Engine 3.30.13, DB v1401, BSAPI 3.30.13 (2020-09-11)

New: Updated STT and KWS model AR_XL to version 5.1.0

Speech Engine 3.30.12, DB v1401, BSAPI 3.30.11 (2020-08-17)

Fixed: MySQL session is not returned to the session pool if RELOAD privilege is not granted in the database, which leads to exhausting of all sessions and server subsequently stops working

Speech Engine 3.30.11, DB v1401, BSAPI 3.30.11 (2020-08-11)

Fixed: Words with probability lower than 0.01 are now not included in STT Confusion Network output (to remove “irrelevant clutter” from the output)

Speech Engine 3.30.10, DB v1401, BSAPI 3.30.10 (2020-07-29)

New: Updated STT model RU_RU_A to version 4.4.0

Speech Engine 3.30.9, DB v1401, BSAPI 3.30.9 (2020-07-01)

New: Added 5^th generation of HR_HR (Croatian) of STT, KWS and PHNREC
Fixed: SPE crashes due to buffer overflow on corrupted recording

Speech Engine 3.30.8, DB v1401, BSAPI 3.30.8 (2020-06-16)

Fixed: STT failure during text-to-number translation in SK_SK_5 model

Speech Engine 3.30.7, DB v1401, BSAPI 3.30.7 (2020-06-03)

Fixed: Increasing memory consumption of SPE
Fixed: KWS delay for some 5^th generation stream configurations

Speech Engine 3.30.6, DB v1401, BSAPI 3.30.6 (2020-05-22)

Fixed: New stream is counted towards running streams even if stream creation fails
Fixed: Incorrect start timestamps on silence tags in STT output
Fixed: Incorrect start timestamps on null words in STT confusion network output
Fixed: STT n-best output is missing channel info

Speech Engine 3.30.5, DB v1401, BSAPI 3.30.5 (2020-05-14)

New: Added new STT model EN_US_A_5
Fixed: Wrong example data in STT model EN_US_5
Fixed: Segmentation fault in G2P in KWS when no pronunciation was generated

Speech Engine 3.30.3, DB v1401, BSAPI 3.30.3 (2020-04-27)

Fixed: Corrected code to SV_SE for Swedish STT, KWS and PHNREC
Fixed: Invalid SQL statement: no such table error in SPE log when using SQLite after update to database schema v1300
Fixed: When task limit is reached, server now responds with HTTP status 503 Service Unavailable instead of 500 Internal server error

Speech Engine 3.30.2, DB v1400, BSAPI 3.30.2 (2020-04-23)

New: Added 5^th generation of SE_SV (Swedish) of STT, KWS and PHNREC
Fixed: Playing TTS via output stream may not be smooth
Fixed: RTP output stream produces packets without timestamp which may cause problems with some RTP clients

Speech Engine 3.30.1, DB v1400, BSAPI 3.30.1 (2020-04-08)

Fixed: TTS Acapela connector does not work due to renamed parameters
Fixed: SPE fails to read reformatted but still valid technologies.xml
Fixed: Zero start- and end time stamps for “null” words in STT confusion-network output
Improved: Words in STT confusion-network are now sorted by confidence

Speech Engine 3.30.0, DB v1400, BSAPI 3.30.0 (2020-03-25)

New: Added 5^th generation of FR_FR (French) of STT, KWS and PHNREC
New: Updated and significantly improved phonemes document for STT and KWS (see doc directory)
New: Added n-best output to all 5^th generation STT stream results
New: Added support for native numbers and dates notation in n-best output in 5^th generation CS_CZ and SK_SK STT (in both file- and stream processing)
New: Each request in SPE log gets unique ID, allowing better request tracing. Also HTTP status and REST error code is logged in case of error
New: Updated STT model RU_RU_A to version 4.3.0
Changed: All utterance_lenght parameters (introduced in 3.24) renamed to speech_length in endpoints returning voiceprint
Changed: Parameters languageCode and languageCodes (introduced in 3.25) renamed to language_code and language_codes in TTS endpoints
Changed: Parameter target (introduced in 3.25) in POST /external/technologies/tts query renamed to path
Improved: Better error message on upload/registering of new file when file cannot be opened
Fixed: Processing long files results in premature end without error message
+ all changes included in Feature Preview releases 3.23 to 3.26 (see below)

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.26

Speech Engine 3.26.0, DB v1400, BSAPI 3.26.0 (2020-02-28)

New: Added new SID4 XL4 model

Speech Engine 3.25 (Public release)

Speech Engine 3.25.1, DB v1400, BSAPI 3.25.0 (2020-02-07)

New: Improved handling of “Accept” HTTP header for better CORS support
Fixed: TTS saves raw file and returns internal server error
Fixed: TTS connector gets stuck when recoding takes long time

Speech Engine 3.25.0, DB v1400, BSAPI 3.25.0 (2020-01-30)

New: Added input stream statistics to result of DELETE /input_stream/rtp call
New: Added support for CORS (can be enabled by server.cors_enable property)
New: Added Acapela TTS integration, see External Text To Speech (supported only in Linux SPE builds!)

Speech Engine 3.24

Speech Engine 3.24.0, DB v1400, BSAPI 3.24.0 (2019-12-10)

New: Significantly improved 5^th generation STT stream performance
- Added neural network based voice activity detection – improves the end-of-utterance detection
- Decoder is now restarted after each segment – i.e. “word corrections’ never go beyond segment boundary
- Added per-segment confidence, computed as an average of all word confidences in a sentence – helps in judging the results ‘credibility’
- Reduced delay of obtaining results in output – allows for faster detection of barge-in, e.g. in voicebot application
New: All 5^th generation STT models now use Minimum Bayes-Risk Decoding for Confusion Network construction
- Confusion Network results now contain precise start- and end times for each individual alternative word
New: KWS confidence value calculation can be modified using confidence_shift and confidence_sharpness values (see KWS results explained article for more details)
New: Added utterance_length to SID/SID4 voiceprint results
New: Added /output_stream and audio file player (/utils/player/output_stream) endpoints
New: Added 5^th generation of AR_XL (Arabic Levantine) (Beta version) of STT, KWS and PHNREC
(combines both North- and South Levantine, hence the custom code AR_XL)
Changed: Changed endpoints, results and properties using the term ‘stream‘ to use ‘input_stream‘
- check the SPE REST API documentation for details
Changed: Technology models named DEFAULT are renamed to GENERIC
- stop SPE and then run phxadmin --configure-tech to automatically update affected technologies configuration
- modify accordingly SPE REST API calls in your application, if applicable
Fixed: STT doesn’t work with models customized using LMC
Fixed: Incorrect end times for <segment/> token in STT results

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.23

Speech Engine 3.23.0, DB v1300, BSAPI 3.23.0 (2019-11-01)

Changed version to 3.23.0 to synchronize with BSAPI
Fixed: SPE sends IP address in Host: HTTP header instead of hostname
Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value

Speech Engine 3.18 (Public release)

Speech Engine 3.18.3, DB v1300, BSAPI 3.22.2 (2019-12-09)

Fixed: STT on stream may cause assert violation when waiting for stream timeout on no input data
Fixed: SPE sends IP address in Host: HTTP header instead of hostname
Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value

Speech Engine 3.18.2, DB v1300, BSAPI 3.22.1 (2019-10-14)

Fixed: Customized STT model fails on Windows with Request for next state but ending state reached. error message

Speech Engine 3.18.1, DB v1300, BSAPI 3.22.0 (2019-10-01)

New: DICTATE technology has been renamed to STT_STREAM (/technologies/dictate -> /technologies/stt/stream)
(for backward compatibility, the /technologies/dictate endpoint is internally redirected)
New: SID/SID4 stream now allows gradually getting voiceprint from the stream (see /technologies/speakerid4/stream/voiceprint)
New: Unicode characters in file names are now supported on Windows platform
New: Added LLR score to GID result (as score_llr value, see /technologies/genderid)
New: Added ‘per_channel‘ parameter to Diarization for processing multi-channel recordings
New: Added configuration option to not start SPE if some technology doesn’t start (server.require_all_configured_technologies)
Fixed: Random SIGSEGV crashes in CS_CZ_5 STT
Fixed: KWS CS_CZ_5 ingnores keyword thresholds
Fixed: Duplicated output from KWS
Fixed: KWS online configurations for models CS_CZ_5 and NL_NL_5
Fixed: phxadmin increases number of instances in configuration instead of setting it
Fixed: phxclient is streaming slower than expected
Fixed: Redefinition of block in used configuration causes segmentation faults

NOTE: Due to the change in GID results content, all GID results will be removed from cache (database) during update!

Speech Engine 3.17.3 (08/22/2019) – DB v1200, BSAPI 3.21.3

[G_#191] Fixed: KWS getting phonemes/graphemes in specific circumstances returns unknown error
[G_BSAPI#413] Fixed: duplicated output from KWS

Speech Engine 3.17.2 (08/02/2019) – DB v1200, BSAPI 3.21.2

[G_BSAPI#300] Fixed: KWS stream results are displayed with a delay

Speech Engine 3.17.1 (07/22/2019) – DB v1200, BSAPI 3.21.1

Added 5^th generation of ES_ES (Spanish) of STT/Dictate/KWS/PHNREC

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.17.0 (06/27/2019) – DB v1200, BSAPI 3.21.0

Added L4 model to GID and AGE technologies, i.e. they now support also SID4 L4 voiceprints
[G#183] Added silence detection in Dictate
[G#182] Added support for RLS capacities
[G#137] Added possibility to specify multiple destinations in server.logging.destination option
[G#136] Phonexia Browser configuration files are now included in data collected by
phxadmin --report command
[G_BSAPI#401] Fixed inability to define phrases in some KWS 5^th generation models (caused by missing sil phoneme)

Speech Engine 3.16.3 (06/06/2019) – DB v1200, BSAPI 3.20.3

[G#180] Fixed regression from 3.16.2: SID4 voiceprint comparator produces inconsistent results

Speech Engine 3.16.2 (06/03/2019) – DB v1200, BSAPI 3.20.2

[G#178] Added 5^th generation of RU_RU (Russian) and EN_US of STT/Dictate/KWS/PHNREC

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.16.1 (05/17/2019) – DB v1200, BSAPI 3.20.1

[G#173] Fixed: Symbols with diacritics in file names (and also speaker model, group names, etc ..) causes errors when using MySQL
[G_BSAPI#397] Fixed: SID4 voiceprint comparator produces inconsistent results

NOTE: Due to issue in SID4 comparator, all SID4 results related to Audio Source Profiles will be deleted!

Speech Engine 3.16.0 (04/26/2019) – DB v1101, BSAPI 3.20.0

[G#146] Default value of server.n_realtime_workers changed from 0 to 8
[G#141] File size limit server.upload_max_filesize is now taken into account also when registering new file
[G#156] Added SID4 streams
[G#157] Added endpoint for updating existing Audio Source Profile
[G#160] SID4 calibration technology renamed: SID4CALIBSET -> SID4CALIB
[G#161] Mean normalization support in Audio Source Profiles
[G#169] Added cache for Audio Source Profiles, see server.audio_source_profiles_cache_size property
[G#170] Added False Acceptance Calibration cache, see server.bsapi_comparator_fa_cache_size
[G#149] Fixed: phxclient prints help if running without parameters
[G#150] Fixed: UTF-8 symbols are not escaped in phxclient output anymore
[G#164] Fixed: names of languages in custom language pack don’t contain r character anymore
[G#166] Fixed: wrong parameter for stopping server in init.d script template

Speech Engine 3.15.6 (03/14/2018) – DB v1101, BSAPI 3.19.2

[BSAPI#370] Added SK_SK (Slovak) 5^th generation of STT, Dictate, KWS and PHNREC

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.15.5 (03/08/2019) – DB v1101, BSAPI 3.19.1

[#147] Fixed SID4 result cache is not invalidated when speaker model is changed
[#145] Add ‘prioritize’ role to the default ‘admin’ user

Speech Engine 3.15.4 (02/28/2019) – DB v1100, BSAPI 3.19.0

[G#131] Added SID v4 technology
[G#133] Resource lock for language pack didn’t work with MySQL database
Removed SID L2 model

Speech Engine 3.14.3 (01/29/2018) – DB v1000, BSAPI 3.18.0

[#130] Fixed phxadmin exiting with error with some argument combinations

Speech Engine 3.14.2 (12/21/2018) – DB v1000, BSAPI 3.18.0

[#125] Speed up phxadmin technology listing
[#93] Fixed getting of Dictate’s and KWS’s results may sometimes take a long time
[#124] Fixed license error cause all already initialized instances of technology with same model are lost
[#116] Fixed command line options with wrong prefix are not ignored anymore
[BSAPI#225] Added KWS/STT NL_NL (Dutch) 5^th generation
[BSAPI#264] Added KWS/STT CS_CZ (Czech) 5^th generation
[BSAPI#287] Added PHNREC PL_PL (Polish) 5^th generation
[BSAPI#242] Upgraded Time Analysis Extractor Technology (switched to STT 5th gen VAD, set cross talk threshold to 0.5 sec)
[BSAPI#291] Fixed PHNREC segmentation goes beyond recording length
[BSAPI#292] Fixed WAV with no speech cause error
[BSAPI#310] Fixed Spanish and English KWS returns incorrect timestamps
[BSAPI#284] Fixed pronunciation of keyword may not be generated

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.13.3 (11/28/2018) – DB v1000, BSAPI 3.17.0

[G#118] Fixed KWS stream is not reinitialized after usage anymore
[G#115] Fixed stream save data to file without name if parameter path is empty

Speech Engine 3.13.2 (11/19/2018) – DB v1000, BSAPI 3.17.0

[G#110] Loading of plugins is configurable, disabled by default
[G#36] Fixed database query may return old data – only MySQL was affected
[G#105] KWS now supports phrases in keyword list
[G#109] Added endpoint for self-compare voiceprint set (/technologies/speakerid/comparevpset)
[G#57] Support for Phonexia RLS
[G#50] Added prioritization of tasks
[G_BSAPI#106] Added wfilter_speech_signal_length output item into the SQE output

Speech Engine 3.12.2 (09/25/2018) – DB v900, BSAPI 3.16.1

[G#96] Fixed phxclient use websocket instead of polling
[G_BSAPI#219] Fixed bug: some corrupted recordings may lead to crash
[G_BSAPI#101] Fixed bug: silence and voice may overlap in VAD segmentation

Speech Engine 3.12.1 (08/17/2018) – DB v900, BSAPI 3.16.0

[#81] Fixed an apostrophe in a file name may cause server error
[#80] Fixed server may bind to the already binded port on Linux
[#76] Fixed cached result is send to webhook target
[#70] Added EULA to the production package
[#59] Added Denoiser technology
[#69] Allow comparing voiceprint with speaker model/group
[#41] Fixed /technologies/diarization/split fails if parameter target doesn’t contain wav suffix or if suffix missing
[#67] GID and AGE technologies accept also SID voiceprint as an input
[#60] Getting voiceprints for all speaker models for given speaker group
[#23] Minimum speech length for extracting SID calibration voiceprint is 60s for newly created calibration sets
[#83] Lower case keyword cause error with some models (cs_CZ)
[BSAPI] Added a new STT and KWS PL_PL (Polish) model version 5.0.0 (the first model of 5^th generation)
[BSAPI] Added more accurate G2P (5^th generation only)
[BSAPI#72] Fixed phoneme recognizer doesn’t make phonemes for phnrec_ru_ru.bs
[BSAPI#99] Fixed phoneme recognizer with configuration phnrec_cs_cz.bs doesn’t transcript short recordings
[BSAPI#82] Fixed missing configuration of phnrec for HR_HR4
[BSAPI#78] Fixed STT segmentation – a segment doesnt break on a long silence, creates false crosstalks
[BSAPI#148] Phoneme recognizer – all phonemes has channel 0 in multi channel recording in some models (cs_CZ)

NOTE: STT output format has changed in 5^th generation:

_DELETE_ token was changed to <null/>
_SILENCE_ and <sil/> tokens were changed to <silence/>
<s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.11.3 (19/06/2018) – BSAPI 3.15.0

[G#77] Update from SPE 3.9 deletes all files from SID models and calibration sets when using SQLite database

Speech Engine 3.11.2 (06/06/2018) – BSAPI 3.15.0

[G#65] Fixed empty keyword list produced internal server error
[G#71] Better recording format detection
[G#73] Fixed possible server crash on Windows

Speech Engine 3.11.1 (03/15/2018) – BSAPI 3.15.0

[G#43] Fixed SIDCalib and KWS technologies were not reinitialized if error occurs
[G#3] Restart MySQL DB transaction when deadlock occurs
[G#26] Added webhooks for asynchronous requests
[G#46] Changed default log verbosity level to ‘debug’
[G#32] Speaker model and group is possible to prepare with calibration
[G#21] Dictate now supports incremental mode
[G#9] Added resource for compare voiceprint sets
[G#42] Optimized SID speed, use DB cache for calibrated voiceprints of speaker models (removed option server.db.sid_model_calib_vp_cache_size)
[G#56] Fixed data may leak between one RTP stream to another
[G#55] Fixed error when client doesn’t send whole samples to stream
[G#63] Phxadmin now checks immediately that user already exists during adding user
[G#64] Fixed premature access to the result of VBS stream may lead to error
[G#52] Update to BSAPI 3.15.0
[G_BSAPI#53] Added support for 64bit float wav format
[G_BSAPI#3] Fixed BSAPI may crash when recording’s header is invalid
[G_BSAPI#5] Fixed Dictate produces different results on second and next run
[G_BSAPI#4] Fixed Dictate CS_CZ last segment of transcription has negative end time
[G_BSAPI#68] Fixed Phoneme Recognizer with configuration phnrec_pl_pl.bs not working
[G_BSAPI#75] Fixed bug: Dictate EN not working properly with a random input buffer size

Speech Engine 3.10.3 (01/18/2018) – BSAPI 3.14.0

[G#22] Fixed audio converter race condition
[G#4] Added configuration option “server.db.sid_model_calib_vp_cache_size”
[G#27, G#30, G#37, G#40] Documentation and manual update

Speech Engine 3.10.2 (12/06/2017) – BSAPI 3.14.0

[#4981] Saving logs to database (MySQL only)
[#4999] Added generating of reports (phxadmin with parameter ‘report’)
[#5055] Added possibility to prepare only one file in calibration set (see API changes)
[#5035] Speed up SID when calibration is used
[#5161] Use MariaDB connector instead of MySQL connector
[#5178] Updated systemd service template – added dependency on network-online.target
[#5070] Added voice-print merge resource (/technologies/speakerid/vpmerge)
[#5099] Added resource which returns tasks of all users (/tasks)
[#5132] Added version of technology model to resource /technologies
[#5134] Added version of BSAPI to resource /server/info
[#5135] Added groups which speaker model is member of to resource /technologies/speakerid/speakermodels/{name}
[#5133] Login of a user can contain any characters except these: /:*?”<>|
[#5150] Fixed connection to MySQL database may be lost in case of hight load
[#5191] Fixed SID Stream requires calibration technology even if parameter ‘calibset’ was not specified
[#5203] Fixed premature access to the result of SID stream may lead to error
[#5192] Update to BSAPI 3.14.0
[Redmine #5130] Renamed PL -> PL_PL models for KWS and STT and updated to version 4.0.0
[GitLab #17] Updated STT RU_RU_A model to version 4.1.0
[GitLab #35] Updated KWS and STT DE_DE models to version 4.0.0
[Redmine #4678] Updated STT CS_CZ model to version 4.1.0

Speech Engine 3.9.3 (10/23/2017) – BSAPI 3.13.0

[#5138] Fixed capital letters in file suffix may cause errors if the file is registered
[#5090] Fixed PHNREC may return error for some audio files
[#5043] Fixed utils resources allow to create file without suffix. Suffix “.wav” is automatically added if the file has no suffix

Speech Engine 3.9.2 (09/08/2017) – BSAPI 3.13.0

[#4899] Fixed possible deadlock in MySQL database when moving files to calibration set
[#4946] Fixed time ranges doesn’t properly work for multichannel recordings and for FLAC and OPUS
[#4946] Fixed parameter “from_time” may cause corruption of processing data
[#4950] Fixed STT may produce incorrect time stamps in confusion network result for multichannel recordings
[#4985] Fixed Removing recording from Speaker model does not invalidate SID result in cache – only on MySQL
[#4955] Fixed concurent access may cause errors on MySQL database
[#4993] Fixed typo in VBS resource path “/vbs/watchlists/[name]/verify/stream” (there was “wachlist”)
[#5038] Fixed stream returns error when no data was sent
[#4910] Fixed extraction of calibration voiceprint take count only last channel in multichannel recording
[#4945] Resource “/technologies” doesn’t require authentication anymore
[#4952] Added possibility to distinguish BSAPI errors from SPE errors in response header
[#4971] phxadmin supports generation of hardware profile (parameter “hwgen”) same as hwgen tool
[#4971] phxadmin doesn’t require license anymore
[#4974] Added list of result versions (doc/result_versions.txt)
[#4983] Added STT_TR model
[#5038] Fixed stream returns error when no data was sent
[#4151] Added KWS benchmark
[#4862] Added PHNREC benchmark
[#4533] Benchmark data are versioned
[#4840] Added checking validity of keyword list
[#4896] Added SID calibration set allows store metafiles
[#4909] Added possibility to get calibration voice-print from calibration set
[#4986] Update BSAPI to v3.13.0
[#4679] Lower STT memory consumption
[#4800] Added new STT HR_HR model 4.0.0
[#4805] Added new STT AR_KW model 4.0.0 (replacing old AR model)
[#4900] Updated STT DE_DE model to version 4.0.0
[#4664] Fixed STT may return empty segmentation and crash without error message
[#4799] Updated KWS CS_CZ model to version 4.0.0
[#4800] Added new KWS HR_HR model 4.0.0
[#4987] Added stream KWS NL_NL model
[#4940] Fixed configuration file for PHNREC AR contains wrong IID
[#4942] Fixed unable to initialize PHNREC ZH
[#4970] Fixed PHNREC with model SLOVAK does not work
[#4968] Fixed KWS with model SLOVAK returns invalid pronunciation
[#4966] Fixed wrong IID in configuration of PHNREC PL
[#4571] Updated Dictate CS_CZ model to version 4.0.0
[#4965] Fixed SID stream extractor with model L3, XL3 does not work
[#4994] Fixed SID stream with model L3 / XL3 throw error after processing of multiple streams

Speech Engine 3.8.3 (06/26/2017) – BSAPI 3.12.0

[#4784] Fixed it is possible to create speaker model or calibration set with character that is invalid for file system
[#4783] Fixed remove RTP stream (created with parameter “path”) without send any data may cause stop processing all RTP streams
[#4781] Fixed server may stucks during shutdown
[#4778] Fixed unable to initialize MySQL database with init.sql script if database has not set default engine to InnoDB
[#4755] Added new technology Phoneme Recognition (PHNREC) – /technologies/phnrec
[#4605] Added new command line parameter “version” to phxspe
[#4713] Added new RTP payloads 35 (Lin16, 8000Hz, 2ch) and 36 (Lin16, 8000Hz, 1ch)
[#4714] Voice-print extractor and comparator now supports calibration
[#4742] Checking audio-file format during registration
[#4812] Update to BSAPI 3.12.0
[#3699] Add missing configuration for stream mode in SID models L3, XL3
[#4527] Update voice-print format for SID models L2 and S (added i-vector to VP). It is forward and backward compatible with previous version.
[#4568] Added KWS TR_TR and AR_KW models
[#4606] Fixed KWS ZH calibration
[#4564] Updated KWS PS model v1.2.0
[#4720] Updated STT NL_NL model v4.1.0
[#4770] Updated STT CS_CZ_FIN model v4.1.0
[#4705] Fixed STT doesn’t transcript file with model SK_TELCO3

Speech Engine 3.7.3 (04/21/2017) – BSAPI 3.11.0

[#4661] Remove old models for STT and KWS
[#4662] Fixed SPE 3.7.2 contains wrong version of BSAPI that may cause some errors

Speech Engine 3.7.2 (03/27/2017) – BSAPI 3.11.0

[#4579] Fixed registering VAD stream returns HTTP code 500 if realtime workers limit exceeded
[#2807] RTP streams now support payload 0 (PCMU) and 8 (PCMA)
[#4536] Added new configuration option “stream.http.timeout”
[#4588] Update BSAPI to 3.11.0
[#4529] Added French stream KWS
[#4305] Added new model STT DE_DE 3.0.0
[#4565] Added nonspeech segment to VAD output
[#4531] Fixed STT SK_TELCO returns empty transcription
[#4513] Fixed STT FR transcription of second channel was shifted
[#4543] Fixed KWS Pashto needs Dutch data
[#4378] Fixed STT ES_AMER1 may returns empty transcription
[#4377] Updated models STT RU_RU, RU_RU_FIN, RU_RU_A to 4.0.0
[#4306] Updated models STT CS_CZ, CS_CZ_FIN, CS_CZ_ENERGY, CS_CZ_TELCO, CS_CZ_IT to 4.0.0
[#4305] Updated KWS DE_DE model to version 3.0.0
[#4377] Updated KWS RU_RU model to version 4.0.0
[#4306] Updated KWS CS_CZ model to version 3.0.0

Speech Engine 3.6.5 (03/22/2017) – BSAPI 3.10.2

[#4586] All benchmark requests without optional parameter “path” ends with error

Speech Engine 3.6.4 (03/10/2017) – BSAPI 3.10.2

[#4516] Processing file with SID with huge calibration set may take a long time

Speech Engine 3.6.3 (02/23/2017) – BSAPI 3.10.2

[#4363] Fixed stream may be deleted by garbage collector immediately after creation
[#4404] Fixed Utils and Benchmarks may cause resource lock error
[#4498] Update BSAPI to 3.10.2
[#4322] Fixed Time analysis extractor sometimes crash
[#4333, #4347] Fixed STT EN 4.0.0 and NL_NL 4.0.0 returns <s>, <sil/> and “silence” segments
Fixed stream KWS EN configuration

Speech Engine 3.6.2 (01/05/2017) – BSAPI 3.10.1

[#4338] Fixed error handling when using websockets

Speech Engine 3.6.1 (12/14/2016) – BSAPI 3.10.1

[#4290] Fixed unable to remove HTTP stream if stream was configured to store data to a file and no data was sent
[#4295] Fixed unable to find license file if path contains special characters [Windows]
[#4145] Added VAD benchmark
[#4146] Added SQE benchmark
[#4148] Added keyword threshold to keyword list
[#3797] Added stream TAE
[#4199] Fixed websocket may not be correctly closed in some cases
[#4216] Changed result for SQE (see API documentation)
[#4188] CPU information in benchmark results does not contains processor codename anymore (it may be inaccurate)
[#4150] Stream technologies VAD and KWS now supports incremental mode (query parameter “result_mode” in POST /technologies/*/stream)
[#4313] Support for logging in separate thread (configuration parameter “server.logging.enable_async”), disabled by default
[#4320] Renamed and updated KWS models: ITALIAN -> IT_IT, DUTCH -> NL_NL
[#4320] Added Dictate model CZ_PROMPT
[#4320] Added STT models: IT_IT, NL_NL (based on DNN), RU_FIN, CZ_PROMPT
[#4320] Updated STT models: AR, CZ, CZ_ENERGY, CZ_FIN, CZ_IT, CZ_TELCO, EN (based on DNN), ZH
[#4320] Updated KWS model ZH
[#4320] Updated VAD model DEFAULT
[#4332] Update BSAPI to 3.10.1
[#4319] New default file logging destination (“log” folder) with daily file rotation and purge after 5 days
[#4319] VBS plugin now supports log file rotation

Speech Engine 3.5.3 (10/25/2016) – BSAPI 3.9.1

Fixed starting several SID tasks at the same time with newly created SID model may cause database inconsistency

Speech Engine 3.5.2 (10/21/2016) – BSAPI 3.9.1

Added french STT
Fixed “is_last” flag was not properly set in results of stream technologies SID, KWS, VAD
Fixed stream VAD used wrong configuration file, that caused the technology not work
Fixed wrong stream VAD result name (SpeakerIdentificationStreamMultiResult -> VoiceActivityDetectionStreamResult)

Speech Engine 3.5.1 (10/06/2016) – BSAPI 3.9.1

Update BSAPI to 3.9.1

Speech Engine 3.5.0 (10/04/2016) – BSAPI 3.9.0

Added global confidence to one best result in STT
Update BSAPI to 3.9.0

Speech Engine 3.4.4 (09/23/2016) – BSAPI 3.8.0

Fixed server require old database schema (v100)
Fixed speed up MySql database requests for file search
Added API changes for version 3.4.x to API documentation

Speech Engine 3.4.3 (09/20/2016) – BSAPI 3.8.0

Fixed server returns error for KWS phoneme request (/technologies/keywordspotting/phonemes) if only KWS or Stream KWS was running

Speech Engine 3.4.2 (09/19/2016) – BSAPI 3.8.0

Added stream VAD (/technologies/vad/stream)
Added stream KWS (/technologies/keywordspotting/stream)
Added technology benchmarks for AGE, DIAR, GID, LID, SID, STT (/technologies/{TECHNOLOGY}/benchmark)
Added request to get voice-print info (/technologies/speakerid/vpinfo)
Added usage examples to API documentation
Add configuration options for TCP connection settings
Added VAD segmentation to Time Analysis technology
Support to acquire and compare language-prints
LID technology was separated to LIDC (comparator) and LIDE (extractor)
Support websockets for pending operations
Added server health check request (GET /status)
Update BSAPI to 3.8.0

Speech Engine 3.3.2 (08/23/2016) – BSAPI 3.6.1

Added configuration option to disable OPUS and FLAC files in storage

Speech Engine 3.3.1 (08/19/2016) – BSAPI 3.6.1

Fixed resource stay locked for some time after task is finished
Minor fixes in documentation

Speech Engine 3.3.0 (07/11/2016) – BSAPI 3.6.1

Phonexia Server renamed to Speech Engine
Fixed some pending operations are not processed until new pending operation is created
Fixed early access to stream SID result may cause server crash
Fixed check if user is active during authentication process
Fixed custom pronunciation in keyword list does not take effect
Added parallel starting of technologies (configuration parameter ‘server.technology_multithread_initialization’) – default is disabled
Added resource locking (configuration parameter ‘server.enable_resource_locker’) – default is enabled
Added request POST /technologies/diarization/split to create multi-channel recording by diarization – each channel coresponds to one speaker
Added request GET /technologies/keywordspotting/phonemes to get supported phonemes
Added log files rotation (configuration parameters ‘server.logging.file.rotation’ and ‘server.logging.file.purge_count’)
Added support for FLAC and OPUS files – it is possible to upload and process these files, but requests which produce new files always produces WAV files
Added request GET /admin/roles to list user roles
Added VBS (Voice Biometry Server) plugin
Added result of GET /server/info contains information about plugins
32-bit architecture (i386) is not supported anymore
Updated BSAPI to 3.6.1

SPE and Browser installation: standalone SPE

Understand SPE administration and backup