SPE3 – Releases and Changelogs

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI.
SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x).

This page lists changes in SPE releases.



Version Release Date End of Support Maintained Until Release type
3.31 2020-07-02 2022-07-02 3.32 Feature
3.30 2020-03-27 2022-09-27 3.35 Public
3.26 2020-03-02 2022-09-02 3.30 Feature
3.25 2020-01-31 2022-07-31 3.26 Feature
3.24 2020-12-18 2022-06-18 3.25 Feature
3.23 2020-11-01 2022-05-01 3.24 Feature
3.18 2019-10-01 2022-04-01 3.19 Public
3.17 2019-06-28 2021-12-28 3.18 Public
3.16 2019-04-26 2021-10-26 3.17 Public
3.15 2019-02-28 2021-08-28 3.16 Public
3.14 2018-12-21 2020-06-21 3.15 Public
3.13 2018-11-19 2020-05-19 3.14 Public
3.12 2018-08-17 2020-02-17 3.13 Public
3.11 2018-03-15 2019-09-15 3.12 Public
3.10 2017-12-06 2019-06-06 3.11 Public
3.9 2017-09-08 2019-03-08 3.10 Public
3.8 2017-06-26 2018-12-26 3.9 Public
3.7 2017-03-27 2018-09-27 3.8 Public
3.6 2016-12-14 2018-06-14 3.7 Public
3.5 2016-10-04 2018-04-04 3.6 Public
3.4 2016-09-19 2018-03-19 3.5 Public
3.3 2016-07-11 2018-02-11 3.4 Public
3.2 2016-04-22 2017-10-22 3.3 Public
3.1 2016-02-15 2017-08-15 3.2 Public
3.0 2016-02-09 2017-08-09 3.1 Public
2.1 2015-09-16 2017-09-16 2017-09-16 Public
2.0 2015-01-06 2016-07-06 2.1 Public



Speech Engine 3.31.1 (07/02/2020) – DB v1500, BSAPI 3.31.0
Non-public Feature Preview release
  • Fixed: SQLite database update from version v1401 fails
Speech Engine 3.31.0 (07/01/2020) – DB v1500, BSAPI 3.31.0
Non-public Feature Preview release
  • New: SPE now requires CentOS 7 or other Linux based OS with glibc >= 2.17
  • New: Added instructions for updating SPE (see doc/UPDATE.txt file)
  • New: Added new LID model BETA_L4
  • New: Audio Source Profile can be now stored in SPE storage without the need for registration
  • Fixed: STT 5th generation confusion network output contains extra legacy _SILENCE_ tokens with weird timestamps
  • Fixed: Stream ID missing in debug log record
  • Fixed: SID4 cannot use Audio Source Profile created with different number of calibration chunks
  • Improved: Updated document doc/Phonemes_for_STT_and_KWS.pdf
  • Removed: Removed VBS plugin
  • Removed: Following STT models are obsolete and not available and supported anymore:

Speech Engine 3.30.9 (07/01/2020) – DB v1401, BSAPI 3.30.9
Public release

  • New: Added 5th generation of HR_HR (Croatian) of STT, KWS and PHNREC
  • Fixed: SPE crashes due to buffer overflow on corrupted recording

Speech Engine 3.30.8 (06/16/2020) – DB v1401, BSAPI 3.30.8
Public release

  • Fixed: STT failure during text-to-number translation in SK_SK_5 model

Speech Engine 3.30.7 (06/03/2020) – DB v1401, BSAPI 3.30.7
Public release

  • Fixed: Increasing memory consumption of SPE
  • Fixed: KWS delay for some 5th generation stream configurations

Speech Engine 3.30.6 (05/22/2020) – DB v1401, BSAPI 3.30.6
Public release

  • Fixed: New stream is counted towards running streams even if stream creation fails
  • Fixed: Incorrect start timestamps on silence tags in STT output
  • Fixed: Incorrect start timestamps on null words in STT confusion network output
  • Fixed: STT n-best output is missing channel info

Speech Engine 3.30.5 (05/14/2020) – DB v1401, BSAPI 3.30.5
Public release

  • New: Added new STT model EN_US_A_5
  • Fixed: Wrong example data in STT model EN_US_5
  • Fixed: Segmentation fault in G2P in KWS when no pronunciation was generated

Speech Engine 3.30.3 (04/27/2020) – DB v1401, BSAPI 3.30.3
Public release

  • Fixed: Corrected code to SV_SE for Swedish STT, KWS and PHNREC
  • Fixed: Invalid SQL statement: no such table error in SPE log when using SQLite after update to database schema v1300
  • Fixed: When task limit is reached, server now responds with HTTP status 503 Service Unavailable instead of 500 Internal server error

Speech Engine 3.30.2 (04/23/2020) – DB v1400, BSAPI 3.30.2
Public release

  • New: Added 5th generation of SE_SV (Swedish) of STT, KWS and PHNREC
  • Fixed: Playing TTS via output stream may not be smooth
  • Fixed: RTP output stream produces packets without timestamp which may cause problems with some RTP clients

Speech Engine 3.30.1 (04/08/2020) – DB v1400, BSAPI 3.30.1
Public release

  • Fixed: TTS Acapela connector does not work due to renamed parameters
  • Fixed: SPE fails to read reformatted but still valid technologies.xml
  • Fixed: Zero start- and end time stamps for “null” words in STT confusion-network output
  • Improved: Words in STT confusion-network are now sorted by confidence

Speech Engine 3.30.0 (03/25/2020) – DB v1400, BSAPI 3.30.0
Public release

  • New: Added 5th generation of FR_FR (French) of STT, KWS and PHNREC
  • New: Updated and significantly improved phonemes document for STT and KWS (see doc directory)
  • New: Added n-best output to all 5th generation STT stream results
  • New: Added support for native numbers and dates notation in n-best output in 5th generation CS_CZ and SK_SK STT (in both file- and stream processing)
  • New: Each request in SPE log gets unique ID, allowing better request tracing. Also HTTP status and REST error code is logged in case of error
  • New: Updated STT model RU_RU_A to version 4.3.0
  • Changed: All utterance_lenght parameters (introduced in 3.24) renamed to speech_length in endpoints returning voiceprint
  • Changed: Parameters languageCode and languageCodes (introduced in 3.25) renamed to language_code and language_codes in TTS endpoints
  • Changed: Parameter target (introduced in 3.25) in POST /external/technologies/tts query renamed to path
  • Improved: Better error message on upload/registering of new file when file cannot be opened
  • Fixed: Processing long files results in premature end without error message

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively
Speech Engine 3.26.0 (02/28/2020) – DB v1400, BSAPI 3.26.0
Non-public Feature Preview release
  • New: Added new SID4 XL4 model
Speech Engine 3.25.1 (02/07/2020) – DB v1400, BSAPI 3.25.0
Non-public Feature Preview release
  • New: Improved handling of “Accept” HTTP header for better CORS support
  • Fixed: TTS saves raw file and returns internal server error
  • Fixed: TTS connector gets stuck when recoding takes long time
Speech Engine 3.25.0 (01/30/2020) – DB v1400, BSAPI 3.25.0
Non-public Feature Preview release
  • New: Added input stream statistics to result of DELETE /input_stream/rtp call
  • New: Added support for CORS (can be enabled by server.cors_enable property)
  • New: Added Acapela TTS integration, see External Text To Speech
    • currently supported only in Linux SPE builds!
Speech Engine 3.24.0 (12/10/2019) – DB v1400, BSAPI 3.24.0
Non-public Feature Preview release
  • New: Significantly improved 5th generation STT stream performance
    • Added neural network based voice activity detection – improves the end-of-utterance detection
    • Decoder is now restarted after each segment – i.e. “word corrections’ never go beyond segment boundary
    • Added per-segment confidence, computed as an average of all word confidences in a sentence – helps in judging the results ‘credibility’
    • Reduced delay of obtaining results in output – allows for faster detection of barge-in, e.g. in voicebot application
  • New: All 5th generation STT models now use Minimum Bayes-Risk Decoding for Confusion Network construction
    • Confusion Network results now contain precise start- and end times for each individual alternative word
  • New: KWS confidence value calculation can be modified using confidence_shift and confidence_sharpness values (see KWS results explained article for more details)
  • New: Added utterance_length to SID/SID4 voiceprint results
  • New: Added /output_stream and audio file player (/utils/player/output_stream) endpoints
  • New: Added 5th generation of AR_XL (Arabic Levantine) (Beta version) of STT, KWS and PHNREC
    (combines both North- and South Levantine, hence the custom code AR_XL)
  • Changed: Changed endpoints, results and properties using the term ‘stream‘ to use ‘input_stream
  • Changed: Technology models named DEFAULT are renamed to GENERIC
    • stop SPE and then run phxadmin --configure-tech to automatically update affected technologies configuration
    • modify accordingly SPE REST API calls in your application, if applicable
  • Fixed: STT doesn’t work with models customized using LMC
  • Fixed: Incorrect end times for <segment/> token in STT results

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively
Speech Engine 3.23.0 (11/01/2019) – DB v1300, BSAPI 3.23.0
Non-public Feature Preview release
  • Changed version to 3.23.0 to synchronize with BSAPI
  • Fixed: SPE sends IP address in Host: HTTP header instead of hostname
  • Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value

Speech Engine 3.18.3 (12/09/2019) – DB v1300, BSAPI 3.22.2
Public release

  • Fixed: STT on stream may cause assert violation when waiting for stream timeout on no input data
  • Fixed: SPE sends IP address in Host: HTTP header instead of hostname
  • Fixed: SPE sometimes outputs “[ERRFMT]” string to log messages instead of actual value

Speech Engine 3.18.2 (10/14/2019) – DB v1300, BSAPI 3.22.1
Public release

  • Fixed: Customized STT model fails on Windows with Request for next state but ending state reached. error message

Speech Engine 3.18.1 (10/01/2019) – DB v1300, BSAPI 3.22.0
Public release

  • New: DICTATE technology has been renamed to STT_STREAM (/technologies/dictate -> /technologies/stt/stream)
    (for backward compatibility, the /technologies/dictate endpoint is internally redirected)
  • New: SID/SID4 stream now allows gradually getting voiceprint from the stream (see /technologies/speakerid4/stream/voiceprint)
  • New: Unicode characters in file names are now supported on Windows platform
  • New: Added LLR score to GID result (as score_llr value, see /technologies/genderid)
  • New: Added ‘per_channel‘ parameter to Diarization for processing multi-channel recordings
  • New: Added configuration option to not start SPE if some technology doesn’t start (server.require_all_configured_technologies)
  • Fixed: Random SIGSEGV crashes in CS_CZ_5 STT
  • Fixed: KWS CS_CZ_5 ingnores keyword thresholds
  • Fixed: Duplicated output from KWS
  • Fixed: KWS online configurations for models CS_CZ_5 and NL_NL_5
  • Fixed: phxadmin increases number of instances in configuration instead of setting it
  • Fixed: phxclient is streaming slower than expected
  • Fixed: Redefinition of block in used configuration causes segmentation faults

NOTE: Due to the change in GID results content, all GID results will be removed from cache (database) during update!

Speech Engine 3.17.3 (08/22/2019) – DB v1200, BSAPI 3.21.3

  • [G_#191] Fixed: KWS getting phonemes/graphemes in specific circumstances returns unknown error
  • [G_BSAPI#413] Fixed: duplicated output from KWS

Speech Engine 3.17.2 (08/02/2019) – DB v1200, BSAPI 3.21.2

  • [G_BSAPI#300] Fixed: KWS stream results are displayed with a delay

Speech Engine 3.17.1 (07/22/2019) – DB v1200, BSAPI 3.21.1

  • Added 5th generation of ES_ES (Spanish) of STT/Dictate/KWS/PHNREC

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.17.0 (06/27/2019) – DB v1200, BSAPI 3.21.0

  • Added L4 model to GID and AGE technologies, i.e. they now support also SID4 L4 voiceprints
  • [G#183] Added silence detection in Dictate
  • [G#182] Added support for RLS capacities
  • [G#137] Added possibility to specify multiple destinations in server.logging.destination option
  • [G#136] Phonexia Browser configuration files are now included in data collected by
    phxadmin --report command
  • [G_BSAPI#401] Fixed inability to define phrases in some KWS 5th generation models (caused by missing sil phoneme)

Speech Engine 3.16.3 (06/06/2019) – DB v1200, BSAPI 3.20.3

  • [G#180] Fixed regression from 3.16.2: SID4 voiceprint comparator produces inconsistent results

Speech Engine 3.16.2 (06/03/2019) – DB v1200, BSAPI 3.20.2

  • [G#178] Added 5th generation of RU_RU (Russian) and EN_US of STT/Dictate/KWS/PHNREC

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.16.1 (05/17/2019) – DB v1200, BSAPI 3.20.1

  • [G#173] Fixed: Symbols with diacritics in file names (and also speaker model, group names, etc ..) causes errors when using MySQL
  • [G_BSAPI#397] Fixed: SID4 voiceprint comparator produces inconsistent results

NOTE: Due to issue in SID4 comparator, all SID4 results related to Audio Source Profiles will be deleted!

Speech Engine 3.16.0 (04/26/2019) – DB v1101, BSAPI 3.20.0

  • [G#146] Default value of server.n_realtime_workers changed from 0 to 8
  • [G#141] File size limit server.upload_max_filesize is now taken into account also when registering new file
  • [G#156] Added SID4 streams
  • [G#157] Added endpoint for updating existing Audio Source Profile
  • [G#160] SID4 calibration technology renamed: SID4CALIBSET -> SID4CALIB
  • [G#161] Mean normalization support in Audio Source Profiles
  • [G#169] Added cache for Audio Source Profiles, see server.audio_source_profiles_cache_size property
  • [G#170] Added False Acceptance Calibration cache, see server.bsapi_comparator_fa_cache_size
  • [G#149] Fixed: phxclient prints help if running without parameters
  • [G#150] Fixed: UTF-8 symbols are not escaped in phxclient output anymore
  • [G#164] Fixed: names of languages in custom language pack don’t contain \r character anymore
  • [G#166] Fixed: wrong parameter for stopping server in init.d script template

Speech Engine 3.15.6 (03/14/2018) – DB v1101, BSAPI 3.19.2

  • [BSAPI#370] Added SK_SK (Slovak) 5th generation of STT, Dictate, KWS and PHNREC

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.15.5 (03/08/2019) – DB v1101, BSAPI 3.19.1

  • [#147] Fixed SID4 result cache is not invalidated when speaker model is changed
  • [#145] Add ‘prioritize’ role to the default ‘admin’ user

Speech Engine 3.15.4 (02/28/2019) – DB v1100, BSAPI 3.19.0

  • [G#131] Added SID v4 technology
  • [G#133] Resource lock for language pack didn’t work with MySQL database
  • Removed SID L2 model

Speech Engine 3.14.3 (01/29/2018) – DB v1000, BSAPI 3.18.0

  • [#130] Fixed phxadmin exiting with error with some argument combinations

Speech Engine 3.14.2 (12/21/2018) – DB v1000, BSAPI 3.18.0

  • [#125] Speed up phxadmin technology listing
  • [#93] Fixed getting of Dictate’s and KWS’s results may sometimes take a long time
  • [#124] Fixed license error cause all already initialized instances of technology with same model are lost
  • [#116] Fixed command line options with wrong prefix are not ignored anymore
  • [BSAPI#225] Added KWS/STT NL_NL (Dutch) 5th generation
  • [BSAPI#264] Added KWS/STT CS_CZ (Czech) 5th generation
  • [BSAPI#287] Added PHNREC PL_PL (Polish) 5th generation
  • [BSAPI#242] Upgraded Time Analysis Extractor Technology (switched to STT 5th gen VAD, set cross talk threshold to 0.5 sec)
  • [BSAPI#291] Fixed PHNREC segmentation goes beyond recording length
  • [BSAPI#292] Fixed WAV with no speech cause error
  • [BSAPI#310] Fixed Spanish and English KWS returns incorrect timestamps
  • [BSAPI#284] Fixed pronunciation of keyword may not be generated

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.13.3 (11/28/2018) – DB v1000, BSAPI 3.17.0

  • [G#118] Fixed KWS stream is not reinitialized after usage anymore
  • [G#115] Fixed stream save data to file without name if parameter path is empty

Speech Engine 3.13.2 (11/19/2018) – DB v1000, BSAPI 3.17.0

  • [G#110] Loading of plugins is configurable, disabled by default
  • [G#36] Fixed database query may return old data – only MySQL was affected
  • [G#105] KWS now supports phrases in keyword list
  • [G#109] Added endpoint for self-compare voiceprint set (/technologies/speakerid/comparevpset)
  • [G#57] Support for Phonexia RLS
  • [G#50] Added prioritization of tasks
  • [G_BSAPI#106] Added wfilter_speech_signal_length output item into the SQE output

Speech Engine 3.12.2 (09/25/2018) – DB v900, BSAPI 3.16.1

  • [G#96] Fixed phxclient use websocket instead of polling
  • [G_BSAPI#219] Fixed bug: some corrupted recordings may lead to crash
  • [G_BSAPI#101] Fixed bug: silence and voice may overlap in VAD segmentation

Speech Engine 3.12.1 (08/17/2018) – DB v900, BSAPI 3.16.0

  • [#81] Fixed an apostrophe in a file name may cause server error
  • [#80] Fixed server may bind to the already binded port on Linux
  • [#76] Fixed cached result is send to webhook target
  • [#70] Added EULA to the production package
  • [#59] Added Denoiser technology
  • [#69] Allow comparing voiceprint with speaker model/group
  • [#41] Fixed /technologies/diarization/split fails if parameter target doesn’t contain wav suffix or if suffix missing
  • [#67] GID and AGE technologies accept also SID voiceprint as an input
  • [#60] Getting voiceprints for all speaker models for given speaker group
  • [#23] Minimum speech length for extracting SID calibration voiceprint is 60s for newly created calibration sets
  • [#83] Lower case keyword cause error with some models (cs_CZ)
  • [BSAPI] Added a new STT and KWS PL_PL (Polish) model version 5.0.0 (the first model of 5th generation)
  • [BSAPI] Added more accurate G2P (5th generation only)
  • [BSAPI#72] Fixed phoneme recognizer doesn’t make phonemes for phnrec_ru_ru.bs
  • [BSAPI#99] Fixed phoneme recognizer with configuration phnrec_cs_cz.bs doesn’t transcript short recordings
  • [BSAPI#82] Fixed missing configuration of phnrec for HR_HR4
  • [BSAPI#78] Fixed STT segmentation – a segment doesnt break on a long silence, creates false crosstalks
  • [BSAPI#148] Phoneme recognizer – all phonemes has channel 0 in multi channel recording in some models (cs_CZ)

NOTE: STT output format has changed in 5th generation:

  • _DELETE_ token was changed to <null/>
  • _SILENCE_ and <sil/> tokens were changed to <silence/>
  • <s> and </s> tokens were changed to <segment> and </segment> respectively

Speech Engine 3.11.3 (19/06/2018) – BSAPI 3.15.0

  • [G#77] Update from SPE 3.9 deletes all files from SID models and calibration sets when using SQLite database

Speech Engine 3.11.2 (06/06/2018) – BSAPI 3.15.0

  • [G#65] Fixed empty keyword list produced internal server error
  • [G#71] Better recording format detection
  • [G#73] Fixed possible server crash on Windows

Speech Engine 3.11.1 (03/15/2018) – BSAPI 3.15.0

  • [G#43] Fixed SIDCalib and KWS technologies were not reinitialized if error occurs
  • [G#3] Restart MySQL DB transaction when deadlock occurs
  • [G#26] Added webhooks for asynchronous requests
  • [G#46] Changed default log verbosity level to ‘debug’
  • [G#32] Speaker model and group is possible to prepare with calibration
  • [G#21] Dictate now supports incremental mode
  • [G#9] Added resource for compare voiceprint sets
  • [G#42] Optimized SID speed, use DB cache for calibrated voiceprints of speaker models (removed option server.db.sid_model_calib_vp_cache_size)
  • [G#56] Fixed data may leak between one RTP stream to another
  • [G#55] Fixed error when client doesn’t send whole samples to stream
  • [G#63] Phxadmin now checks immediately that user already exists during adding user
  • [G#64] Fixed premature access to the result of VBS stream may lead to error
  • [G#52] Update to BSAPI 3.15.0
  • [G_BSAPI#53] Added support for 64bit float wav format
  • [G_BSAPI#3] Fixed BSAPI may crash when recording’s header is invalid
  • [G_BSAPI#5] Fixed Dictate produces different results on second and next run
  • [G_BSAPI#4] Fixed Dictate CS_CZ last segment of transcription has negative end time
  • [G_BSAPI#68] Fixed Phoneme Recognizer with configuration phnrec_pl_pl.bs not working
  • [G_BSAPI#75] Fixed bug: Dictate EN not working properly with a random input buffer size

Speech Engine 3.10.3 (01/18/2018) – BSAPI 3.14.0

  • [G#22] Fixed audio converter race condition
  • [G#4] Added configuration option “server.db.sid_model_calib_vp_cache_size”
  • [G#27, G#30, G#37, G#40] Documentation and manual update

Speech Engine 3.10.2 (12/06/2017) – BSAPI 3.14.0

  • [#4981] Saving logs to database (MySQL only)
  • [#4999] Added generating of reports (phxadmin with parameter ‘report’)
  • [#5055] Added possibility to prepare only one file in calibration set (see API changes)
  • [#5035] Speed up SID when calibration is used
  • [#5161] Use MariaDB connector instead of MySQL connector
  • [#5178] Updated systemd service template – added dependency on network-online.target
  • [#5070] Added voice-print merge resource (/technologies/speakerid/vpmerge)
  • [#5099] Added resource which returns tasks of all users (/tasks)
  • [#5132] Added version of technology model to resource /technologies
  • [#5134] Added version of BSAPI to resource /server/info
  • [#5135] Added groups which speaker model is member of to resource /technologies/speakerid/speakermodels/{name}
  • [#5133] Login of a user can contain any characters except these: \/:*?”<>|
  • [#5150] Fixed connection to MySQL database may be lost in case of hight load
  • [#5191] Fixed SID Stream requires calibration technology even if parameter ‘calibset’ was not specified
  • [#5203] Fixed premature access to the result of SID stream may lead to error
  • [#5192] Update to BSAPI 3.14.0
  • [Redmine #5130] Renamed PL -> PL_PL models for KWS and STT and updated to version 4.0.0
  • [GitLab #17] Updated STT RU_RU_A model to version 4.1.0
  • [GitLab #35] Updated KWS and STT DE_DE models to version 4.0.0
  • [Redmine #4678] Updated STT CS_CZ model to version 4.1.0

Speech Engine 3.9.3 (10/23/2017) – BSAPI 3.13.0

  • [#5138] Fixed capital letters in file suffix may cause errors if the file is registered
  • [#5090] Fixed PHNREC may return error for some audio files
  • [#5043] Fixed utils resources allow to create file without suffix. Suffix “.wav” is automatically added if the file has no suffix

Speech Engine 3.9.2 (09/08/2017) – BSAPI 3.13.0

  • [#4899] Fixed possible deadlock in MySQL database when moving files to calibration set
  • [#4946] Fixed time ranges doesn’t properly work for multichannel recordings and for FLAC and OPUS
  • [#4946] Fixed parameter “from_time” may cause corruption of processing data
  • [#4950] Fixed STT may produce incorrect time stamps in confusion network result for multichannel recordings
  • [#4985] Fixed Removing recording from Speaker model does not invalidate SID result in cache – only on MySQL
  • [#4955] Fixed concurent access may cause errors on MySQL database
  • [#4993] Fixed typo in VBS resource path “/vbs/watchlists/[name]/verify/stream” (there was “wachlist”)
  • [#5038] Fixed stream returns error when no data was sent
  • [#4910] Fixed extraction of calibration voiceprint take count only last channel in multichannel recording
  • [#4945] Resource “/technologies” doesn’t require authentication anymore
  • [#4952] Added possibility to distinguish BSAPI errors from SPE errors in response header
  • [#4971] phxadmin supports generation of hardware profile (parameter “hwgen”) same as hwgen tool
  • [#4971] phxadmin doesn’t require license anymore
  • [#4974] Added list of result versions (doc/result_versions.txt)
  • [#4983] Added STT_TR model
  • [#5038] Fixed stream returns error when no data was sent
  • [#4151] Added KWS benchmark
  • [#4862] Added PHNREC benchmark
  • [#4533] Benchmark data are versioned
  • [#4840] Added checking validity of keyword list
  • [#4896] Added SID calibration set allows store metafiles
  • [#4909] Added possibility to get calibration voice-print from calibration set
  • [#4986] Update BSAPI to v3.13.0
  • [#4679] Lower STT memory consumption
  • [#4800] Added new STT HR_HR model 4.0.0
  • [#4805] Added new STT AR_KW model 4.0.0 (replacing old AR model)
  • [#4900] Updated STT DE_DE model to version 4.0.0
  • [#4664] Fixed STT may return empty segmentation and crash without error message
  • [#4799] Updated KWS CS_CZ model to version 4.0.0
  • [#4800] Added new KWS HR_HR model 4.0.0
  • [#4987] Added stream KWS NL_NL model
  • [#4940] Fixed configuration file for PHNREC AR contains wrong IID
  • [#4942] Fixed unable to initialize PHNREC ZH
  • [#4970] Fixed PHNREC with model SLOVAK does not work
  • [#4968] Fixed KWS with model SLOVAK returns invalid pronunciation
  • [#4966] Fixed wrong IID in configuration of PHNREC PL
  • [#4571] Updated Dictate CS_CZ model to version 4.0.0
  • [#4965] Fixed SID stream extractor with model L3, XL3 does not work
  • [#4994] Fixed SID stream with model L3 / XL3 throw error after processing of multiple streams

Speech Engine 3.8.3 (06/26/2017) – BSAPI 3.12.0

  • [#4784] Fixed it is possible to create speaker model or calibration set with character that is invalid for file system
  • [#4783] Fixed remove RTP stream (created with parameter “path”) without send any data may cause stop processing all RTP streams
  • [#4781] Fixed server may stucks during shutdown
  • [#4778] Fixed unable to initialize MySQL database with init.sql script if database has not set default engine to InnoDB
  • [#4755] Added new technology Phoneme Recognition (PHNREC) – /technologies/phnrec
  • [#4605] Added new command line parameter “version” to phxspe
  • [#4713] Added new RTP payloads 35 (Lin16, 8000Hz, 2ch) and 36 (Lin16, 8000Hz, 1ch)
  • [#4714] Voice-print extractor and comparator now supports calibration
  • [#4742] Checking audio-file format during registration
  • [#4812] Update to BSAPI 3.12.0
  • [#3699] Add missing configuration for stream mode in SID models L3, XL3
  • [#4527] Update voice-print format for SID models L2 and S (added i-vector to VP). It is forward and backward compatible with previous version.
  • [#4568] Added KWS TR_TR and AR_KW models
  • [#4606] Fixed KWS ZH calibration
  • [#4564] Updated KWS PS model v1.2.0
  • [#4720] Updated STT NL_NL model v4.1.0
  • [#4770] Updated STT CS_CZ_FIN model v4.1.0
  • [#4705] Fixed STT doesn’t transcript file with model SK_TELCO3

Speech Engine 3.7.3 (04/21/2017) – BSAPI 3.11.0

  • [#4661] Remove old models for STT and KWS
  • [#4662] Fixed SPE 3.7.2 contains wrong version of BSAPI that may cause some errors

Speech Engine 3.7.2 (03/27/2017) – BSAPI 3.11.0

  • [#4579] Fixed registering VAD stream returns HTTP code 500 if realtime workers limit exceeded
  • [#2807] RTP streams now support payload 0 (PCMU) and 8 (PCMA)
  • [#4536] Added new configuration option “stream.http.timeout”
  • [#4588] Update BSAPI to 3.11.0
  • [#4529] Added French stream KWS
  • [#4305] Added new model STT DE_DE 3.0.0
  • [#4565] Added nonspeech segment to VAD output
  • [#4531] Fixed STT SK_TELCO returns empty transcription
  • [#4513] Fixed STT FR transcription of second channel was shifted
  • [#4543] Fixed KWS Pashto needs Dutch data
  • [#4378] Fixed STT ES_AMER1 may returns empty transcription
  • [#4377] Updated models STT RU_RU, RU_RU_FIN, RU_RU_A to 4.0.0
  • [#4306] Updated models STT CS_CZ, CS_CZ_FIN, CS_CZ_ENERGY, CS_CZ_TELCO, CS_CZ_IT to 4.0.0
  • [#4305] Updated KWS DE_DE model to version 3.0.0
  • [#4377] Updated KWS RU_RU model to version 4.0.0
  • [#4306] Updated KWS CS_CZ model to version 3.0.0

Speech Engine 3.6.5 (03/22/2017) – BSAPI 3.10.2

  • [#4586] All benchmark requests without optional parameter “path” ends with error

Speech Engine 3.6.4 (03/10/2017) – BSAPI 3.10.2

  • [#4516] Processing file with SID with huge calibration set may take a long time

Speech Engine 3.6.3 (02/23/2017) – BSAPI 3.10.2

  • [#4363] Fixed stream may be deleted by garbage collector immediately after creation
  • [#4404] Fixed Utils and Benchmarks may cause resource lock error
  • [#4498] Update BSAPI to 3.10.2
  • [#4322] Fixed Time analysis extractor sometimes crash
  • [#4333, #4347] Fixed STT EN 4.0.0 and NL_NL 4.0.0 returns <s>, <sil/> and “silence” segments
  • Fixed stream KWS EN configuration

Speech Engine 3.6.2 (01/05/2017) – BSAPI 3.10.1

  • [#4338] Fixed error handling when using websockets

Speech Engine 3.6.1 (12/14/2016) – BSAPI 3.10.1

  • [#4290] Fixed unable to remove HTTP stream if stream was configured to store data to a file and no data was sent
  • [#4295] Fixed unable to find license file if path contains special characters [Windows]
  • [#4145] Added VAD benchmark
  • [#4146] Added SQE benchmark
  • [#4148] Added keyword threshold to keyword list
  • [#3797] Added stream TAE
  • [#4199] Fixed websocket may not be correctly closed in some cases
  • [#4216] Changed result for SQE (see API documentation)
  • [#4188] CPU information in benchmark results does not contains processor codename anymore (it may be inaccurate)
  • [#4150] Stream technologies VAD and KWS now supports incremental mode (query parameter “result_mode” in POST /technologies/*/stream)
  • [#4313] Support for logging in separate thread (configuration parameter “server.logging.enable_async”), disabled by default
  • [#4320] Renamed and updated KWS models: ITALIAN -> IT_IT, DUTCH -> NL_NL
  • [#4320] Added Dictate model CZ_PROMPT
  • [#4320] Added STT models: IT_IT, NL_NL (based on DNN), RU_FIN, CZ_PROMPT
  • [#4320] Updated STT models: AR, CZ, CZ_ENERGY, CZ_FIN, CZ_IT, CZ_TELCO, EN (based on DNN), ZH
  • [#4320] Updated KWS model ZH
  • [#4320] Updated VAD model DEFAULT
  • [#4332] Update BSAPI to 3.10.1
  • [#4319] New default file logging destination (“log” folder) with daily file rotation and purge after 5 days
  • [#4319] VBS plugin now supports log file rotation

Speech Engine 3.5.3 (10/25/2016) – BSAPI 3.9.1

  • Fixed starting several SID tasks at the same time with newly created SID model may cause database inconsistency

Speech Engine 3.5.2 (10/21/2016) – BSAPI 3.9.1

  • Added french STT
  • Fixed “is_last” flag was not properly set in results of stream technologies SID, KWS, VAD
  • Fixed stream VAD used wrong configuration file, that caused the technology not work
  • Fixed wrong stream VAD result name (SpeakerIdentificationStreamMultiResult -> VoiceActivityDetectionStreamResult)

Speech Engine 3.5.1 (10/06/2016) – BSAPI 3.9.1

  • Update BSAPI to 3.9.1

Speech Engine 3.5.0 (10/04/2016) – BSAPI 3.9.0

  • Added global confidence to one best result in STT
  • Update BSAPI to 3.9.0

Speech Engine 3.4.4 (09/23/2016) – BSAPI 3.8.0

  • Fixed server require old database schema (v100)
  • Fixed speed up MySql database requests for file search
  • Added API changes for version 3.4.x to API documentation

Speech Engine 3.4.3 (09/20/2016) – BSAPI 3.8.0

  • Fixed server returns error for KWS phoneme request (/technologies/keywordspotting/phonemes) if only KWS or Stream KWS was running

Speech Engine 3.4.2 (09/19/2016) – BSAPI 3.8.0

  • Added stream VAD (/technologies/vad/stream)
  • Added stream KWS (/technologies/keywordspotting/stream)
  • Added technology benchmarks for AGE, DIAR, GID, LID, SID, STT (/technologies/{TECHNOLOGY}/benchmark)
  • Added request to get voice-print info (/technologies/speakerid/vpinfo)
  • Added usage examples to API documentation
  • Add configuration options for TCP connection settings
  • Added VAD segmentation to Time Analysis technology
  • Support to acquire and compare language-prints
  • LID technology was separated to LIDC (comparator) and LIDE (extractor)
  • Support websockets for pending operations
  • Added server health check request (GET /status)
  • Update BSAPI to 3.8.0

Speech Engine 3.3.2 (08/23/2016) – BSAPI 3.6.1

  • Added configuration option to disable OPUS and FLAC files in storage

Speech Engine 3.3.1 (08/19/2016) – BSAPI 3.6.1

  • Fixed resource stay locked for some time after task is finished
  • Minor fixes in documentation

Speech Engine 3.3.0 (07/11/2016) – BSAPI 3.6.1

  • Phonexia Server renamed to Speech Engine
  • Fixed some pending operations are not processed until new pending operation is created
  • Fixed early access to stream SID result may cause server crash
  • Fixed check if user is active during authentication process
  • Fixed custom pronunciation in keyword list does not take effect
  • Added parallel starting of technologies (configuration parameter ‘server.technology_multithread_initialization’) – default is disabled
  • Added resource locking (configuration parameter ‘server.enable_resource_locker’) – default is enabled
  • Added request POST /technologies/diarization/split to create multi-channel recording by diarization – each channel coresponds to one speaker
  • Added request GET /technologies/keywordspotting/phonemes to get supported phonemes
  • Added log files rotation (configuration parameters ‘server.logging.file.rotation’ and ‘server.logging.file.purge_count’)
  • Added support for FLAC and OPUS files – it is possible to upload and process these files, but requests which produce new files always produces WAV files
  • Added request GET /admin/roles to list user roles
  • Added VBS (Voice Biometry Server) plugin
  • Added result of GET /server/info contains information about plugins
  • 32-bit architecture (i386) is not supported anymore
  • Updated BSAPI to 3.6.1
Posted in PublicNews, Support and tagged , , , .