SPE3 – Releases and Changelogs

Phonexia Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI.
SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x).

This page lists changes in SPE releases.

Releases

Version Release Date End of Support Maintained Until
3.16 2019-04-26 2021-10-26 3.17
3.15 2019-02-28 2021-08-28 3.16
3.14 2018-12-21 2020-06-21 3.15
3.13 2018-11-19 2020-05-19 3.14
3.12 2018-08-17 2020-02-17 3.13
3.11 2018-03-15 2019-09-15 3.12
3.10 2017-12-06 2019-06-06 3.11
3.9 2017-09-08 2019-03-08 3.10
3.8 2017-06-26 2018-12-26 3.9
3.7 2017-03-27 2018-09-27 3.8
3.6 2016-12-14 2018-06-14 3.7
3.5 2016-10-04 2018-04-04 3.6
3.4 2016-09-19 2018-03-19 3.5
3.3 2016-07-11 2018-02-11 3.4
3.2 2016-04-22 2017-10-22 3.3
3.1 2016-02-15 2017-08-15 3.2
3.0 2016-02-09 2017-08-09 3.1
2.1 2015-09-16 2017-09-16 2017-09-16
2.0 2015-01-06 2016-07-06 2.1

 

Changelogs

== SPE v3.16.x ==

Phonexia Speech Engine 3.16.1 (05/17/2019) – DB v1200, BSAPI 3.20.1

* [G#173] Fixed: Symbols with diacritics in file names (and also speaker model, group names, etc ..) causes errors when using MySQL
* [G_BSAPI#397] Fixed: SID4 voiceprint comparator produces inconsistent results

NOTE: Due to issue in SID4 comparator, all SID4 results related to Audio Source Profiles will be deleted!

Phonexia Speech Engine 3.16.0 (04/26/2019) – DB v1101, BSAPI 3.20.0

* [G#146] Default value of server.n_realtime_workers changed from 0 to 8
* [G#141] File size limit server.upload_max_filesize is now taken into account also when registering new file
* [G#156] Added SID4 streams
* [G#157] Added endpoint for updating existing Audio Source Profile
* [G#160] SID4 calibration technology renamed: SID4CALIBSET -> SID4CALIB
* [G#161] Mean normalization support in Audio Source Profiles
* [G#169] Added cache for Audio Source Profiles, see server.audio_source_profiles_cache_size property
* [G#170] Added False Acceptance Calibration cache, see server.bsapi_comparator_fa_cache_size
* [G#149] Fixed: phxclient prints help if running without parameters
* [G#150] Fixed: UTF-8 symbols are not escaped in phxclient output anymore
* [G#164] Fixed: names of languages in custom language pack don’t contain \r character anymore
* [G#166] Fixed: wrong parameter for stopping server in init.d script template

== SPE v3.15.x ==

Phonexia Speech Engine 3.15.6 (03/14/2018) – DB v1101, BSAPI 3.19.2

* [BSAPI#370] Added SK_SK 5th generation of STT, Dictate, KWS and PHNREC

NOTE:
STT output format has changed in 5th generation:
* _DELETE_ token was changed to <null/>
* _SILENCE_ and <sil/> tokens were changed to <silence/>
* <s> and </s> tokens were changed to <segment> and </segment> respectively

Phonexia Speech Engine 3.15.5 (03/08/2019) – DB v1101, BSAPI 3.19.1

* [#147] Fixed SID4 result cache is not invalidated when speaker model is changed
* [#145] Add ‘prioritize’ role to the default ‘admin’ user

Phonexia Speech Engine 3.15.4 (02/28/2019) – DB v1100, BSAPI 3.19.0

* [G#131] Added SID v4 technology
* [G#133] Resource lock for language pack didn’t work with MySQL database
* Removed SID L2 model


== SPE v3.14.x ==

Phonexia Speech Engine 3.14.3 (01/29/2018) – DB v1000, BSAPI 3.18.0

* [#130] Fixed phxadmin exiting with error with some argument combinations

Phonexia Speech Engine 3.14.2 (12/21/2018) – DB v1000, BSAPI 3.18.0

* [#125] Speed up phxadmin technology listing
* [#93] Fixed getting of Dictate’s and KWS’s results may sometimes take a long time
* [#124] Fixed license error cause all already initialized instances of technology with same model are lost
* [#116] Fixed command line options with wrong prefix are not ignored anymore
* [BSAPI#225] Added KWS/STT NL_NL 5th generation
* [BSAPI#264] Added KWS/STT CS_CZ 5th generation
* [BSAPI#287] Added PHNREC PL_PL 5th generation
* [BSAPI#242] Upgraded Time Analysis Extractor Technology (switched to STT 5th gen VAD, set cross talk threshold to 0.5 sec)
* [BSAPI#291] Fixed PHNREC segmentation goes beyond recording length
* [BSAPI#292] Fixed WAV with no speech cause error
* [BSAPI#310] Fixed Spanish and English KWS returns incorrect timestamps
* [BSAPI#284] Fixed pronunciation of keyword may not be generated

NOTE:
STT output format has changed in 5th generation:
* _DELETE_ token was changed to <null/>
* _SILENCE_ and <sil/> tokens were changed to <silence/>
* <s> and </s> tokens were changed to <segment> and </segment> respectively


== SPE v3.13.x ==

Phonexia Speech Engine 3.13.3 (11/28/2018) – DB v1000, BSAPI 3.17.0

* [G#118] Fixed KWS stream is not reinitialized after usage anymore
* [G#115] Fixed stream save data to file without name if parameter path is empty

Phonexia Speech Engine 3.13.2 (11/19/2018) – DB v1000, BSAPI 3.17.0

* [G#110] Loading of plugins is configurable, disabled by default
* [G#36] Fixed database query may return old data – only MySQL was affected
* [G#105] KWS now supports phrases in keyword list
* [G#109] Added endpoint for self-compare voiceprint set (/technologies/speakerid/comparevpset)
* [G#57] Support for Phonexia RLS
* [G#50] Added prioritization of tasks
* [G_BSAPI#106] Added wfilter_speech_signal_length output item into the SQE output


== SPE v3.12.x ==

Phonexia Speech Engine 3.12.2 (09/25/2018) – DB v900, BSAPI 3.16.1

* [G#96] Fixed phxclient use websocket instead of polling
* [G_BSAPI#219] Fixed bug: some corrupted recordings may lead to crash
* [G_BSAPI#101] Fixed bug: silence and voice may overlap in VAD segmentation

Phonexia Speech Engine 3.12.1 (08/17/2018) – DB v900, BSAPI 3.16.0

* [#81] Fixed an apostrophe in a file name may cause server error
* [#80] Fixed server may bind to the already binded port on Linux
* [#76] Fixed cached result is send to webhook target
* [#70] Added EULA to the production package
* [#59] Added Denoiser technology
* [#69] Allow comparing voiceprint with speaker model/group
* [#41] Fixed /technologies/diarization/split fails if parameter target doesn’t contain wav suffix or if suffix missing
* [#67] GID and AGE technologies accept also SID voiceprint as an input
* [#60] Getting voiceprints for all speaker models for given speaker group
* [#23] Minimum speech length for extracting SID calibration voiceprint is 60s for newly created calibration sets
* [#83] Lower case keyword cause error with some models (cs_CZ)
* [BSAPI] Added a new STT and KWS PL_PL model version 5.0.0 (the first model of 5th generation)
* [BSAPI] Added more accurate G2P (5th generation only)
* [BSAPI#72] Fixed phoneme recognizer doesn’t make phonemes for phnrec_ru_ru.bs
* [BSAPI#99] Fixed phoneme recognizer with configuration phnrec_cs_cz.bs doesn’t transcript short recordings
* [BSAPI#82] Fixed missing configuration of phnrec for HR_HR4
* [BSAPI#78] Fixed STT segmentation – a segment doesnt break on a long silence, creates false crosstalks
* [BSAPI#148] Phoneme recognizer – all phonemes has channel 0 in multi channel recording in some models (cs_CZ)

NOTE:
STT output format has changed in 5th generation:
* _DELETE_ token was changed to <null/>
* _SILENCE_ and <sil/> tokens were changed to <silence/>
* <s> and </s> tokens were changed to <segment> and </segment> respectively


== SPE v3.11.x ==

Phonexia Speech Engine 3.11.3 (19/06/2018) – BSAPI 3.15.0

* [G#77] Update from SPE 3.9 deletes all files from SID models and calibration sets when using SQLite database

Phonexia Speech Engine 3.11.2 (06/06/2018) – BSAPI 3.15.0

* [G#65] Fixed empty keyword list produced internal server error
* [G#71] Better recording format detection
* [G#73] Fixed possible server crash on Windows

Phonexia Speech Engine 3.11.1 (03/15/2018) – BSAPI 3.15.0

* [G#43] Fixed SIDCalib and KWS technologies were not reinitialized if error occurs
* [G#3] Restart MySQL DB transaction when deadlock occurs
* [G#26] Added webhooks for asynchronous requests
* [G#46] Changed default log verbosity level to ‘debug’
* [G#32] Speaker model and group is possible to prepare with calibration
* [G#21] Dictate now supports incremental mode
* [G#9] Added resource for compare voiceprint sets
* [G#42] Optimized SID speed, use DB cache for calibrated voiceprints of speaker models (removed option server.db.sid_model_calib_vp_cache_size)
* [G#56] Fixed data may leak between one RTP stream to another
* [G#55] Fixed error when client doesn’t send whole samples to stream
* [G#63] Phxadmin now checks immediately that user already exists during adding user
* [G#64] Fixed premature access to the result of VBS stream may lead to error
* [G#52] Update to BSAPI 3.15.0
* [G_BSAPI#53] Added support for 64bit float wav format
* [G_BSAPI#3] Fixed BSAPI may crash when recording’s header is invalid
* [G_BSAPI#5] Fixed Dictate produces different results on second and next run
* [G_BSAPI#4] Fixed Dictate CS_CZ last segment of transcription has negative end time
* [G_BSAPI#68] Fixed Phoneme Recognizer with configuration phnrec_pl_pl.bs not working
* [G_BSAPI#75] Fixed bug: Dictate EN not working properly with a random input buffer size


== SPE v3.10.x ==

Phonexia Speech Engine 3.10.3 (01/18/2018) – BSAPI 3.14.0

* [G#22] Fixed audio converter race condition
* [G#4] Added configuration option “server.db.sid_model_calib_vp_cache_size”
* [G#27, G#30, G#37, G#40] Documentation and manual update

Phonexia Speech Engine 3.10.2 (12/06/2017) – BSAPI 3.14.0

* [#4981] Saving logs to database (MySQL only)
* [#4999] Added generating of reports (phxadmin with parameter ‘report’)
* [#5055] Added possibility to prepare only one file in calibration set (see API changes)
* [#5035] Speed up SID when calibration is used
* [#5161] Use MariaDB connector instead of MySQL connector
* [#5178] Updated systemd service template – added dependency on network-online.target
* [#5070] Added voice-print merge resource (/technologies/speakerid/vpmerge)
* [#5099] Added resource which returns tasks of all users (/tasks)
* [#5132] Added version of technology model to resource /technologies
* [#5134] Added version of BSAPI to resource /server/info
* [#5135] Added groups which speaker model is member of to resource /technologies/speakerid/speakermodels/{name}
* [#5133] Login of a user can contain any characters except these: \/:*?”<>|
* [#5150] Fixed connection to MySQL database may be lost in case of hight load
* [#5191] Fixed SID Stream requires calibration technology even if parameter ‘calibset’ was not specified
* [#5203] Fixed premature access to the result of SID stream may lead to error
* [#5192] Update to BSAPI 3.14.0
* [Redmine #5130] Renamed PL -> PL_PL models for KWS and STT and updated to version 4.0.0
* [GitLab #17] Updated STT RU_RU_A model to version 4.1.0
* [GitLab #35] Updated KWS and STT DE_DE models to version 4.0.0
* [Redmine #4678] Updated STT CS_CZ model to version 4.1.0


== SPE v3.9.x ==

Phonexia Speech Engine 3.9.3 (10/23/2017) – BSAPI 3.13.0

* [#5138] Fixed capital letters in file suffix may cause errors if the file is registered
* [#5090] Fixed PHNREC may return error for some audio files
* [#5043] Fixed utils resources allow to create file without suffix. Suffix “.wav” is automatically added if the file has no suffix

Phonexia Speech Engine 3.9.2 (09/08/2017) – BSAPI 3.13.0

* [#4899] Fixed possible deadlock in MySQL database when moving files to calibration set
* [#4946] Fixed time ranges doesn’t properly work for multichannel recordings and for FLAC and OPUS
* [#4946] Fixed parameter “from_time” may cause corruption of processing data
* [#4950] Fixed STT may produce incorrect time stamps in confusion network result for multichannel recordings
* [#4985] Fixed Removing recording from Speaker model does not invalidate SID result in cache – only on MySQL
* [#4955] Fixed concurent access may cause errors on MySQL database
* [#4993] Fixed typo in VBS resource path “/vbs/watchlists/[name]/verify/stream” (there was “wachlist”)
* [#5038] Fixed stream returns error when no data was sent
* [#4910] Fixed extraction of calibration voiceprint take count only last channel in multichannel recording
* [#4945] Resource “/technologies” doesn’t require authentication anymore
* [#4952] Added possibility to distinguish BSAPI errors from SPE errors in response header
* [#4971] phxadmin supports generation of hardware profile (parameter “hwgen”) same as hwgen tool
* [#4971] phxadmin doesn’t require license anymore
* [#4974] Added list of result versions (doc/result_versions.txt)
* [#4983] Added STT_TR model
* [#5038] Fixed stream returns error when no data was sent
* [#4151] Added KWS benchmark
* [#4862] Added PHNREC benchmark
* [#4533] Benchmark data are versioned
* [#4840] Added checking validity of keyword list
* [#4896] Added SID calibration set allows store metafiles
* [#4909] Added possibility to get calibration voice-print from calibration set
* [#4986] Update BSAPI to v3.13.0
* [#4679] Lower STT memory consumption
* [#4800] Added new STT HR_HR model 4.0.0
* [#4805] Added new STT AR_KW model 4.0.0 (replacing old AR model)
* [#4900] Updated STT DE_DE model to version 4.0.0
* [#4664] Fixed STT may return empty segmentation and crash without error message
* [#4799] Updated KWS CS_CZ model to version 4.0.0
* [#4800] Added new KWS HR_HR model 4.0.0
* [#4987] Added stream KWS NL_NL model
* [#4940] Fixed configuration file for PHNREC AR contains wrong IID
* [#4942] Fixed unable to initialize PHNREC ZH
* [#4970] Fixed PHNREC with model SLOVAK does not work
* [#4968] Fixed KWS with model SLOVAK returns invalid pronunciation
* [#4966] Fixed wrong IID in configuration of PHNREC PL
* [#4571] Updated Dictate CS_CZ model to version 4.0.0
* [#4965] Fixed SID stream extractor with model L3, XL3 does not work
* [#4994] Fixed SID stream with model L3 / XL3 throw error after processing of multiple streams


== SPE v3.8.x ==

Phonexia Speech Engine 3.8.3 (06/26/2017) – BSAPI 3.12.0

* [#4784] Fixed it is possible to create speaker model or calibration set with character that is invalid for file system
* [#4783] Fixed remove RTP stream (created with parameter “path”) without send any data may cause stop processing all RTP streams
* [#4781] Fixed server may stucks during shutdown
* [#4778] Fixed unable to initialize MySQL database with init.sql script if database has not set default engine to InnoDB
* [#4755] Added new technology Phoneme Recognition (PHNREC) – /technologies/phnrec
* [#4605] Added new command line parameter “version” to phxspe
* [#4713] Added new RTP payloads 35 (Lin16, 8000Hz, 2ch) and 36 (Lin16, 8000Hz, 1ch)
* [#4714] Voice-print extractor and comparator now supports calibration
* [#4742] Checking audio-file format during registration
* [#4812] Update to BSAPI 3.12.0
* [#3699] Add missing configuration for stream mode in SID models L3, XL3
* [#4527] Update voice-print format for SID models L2 and S (added i-vector to VP). It is forward and backward compatible with previous version.
* [#4568] Added KWS TR_TR and AR_KW models
* [#4606] Fixed KWS ZH calibration
* [#4564] Updated KWS PS model v1.2.0
* [#4720] Updated STT NL_NL model v4.1.0
* [#4770] Updated STT CS_CZ_FIN model v4.1.0
* [#4705] Fixed STT doesn’t transcript file with model SK_TELCO3


== SPE v3.7.x ==

Phonexia Speech Engine 3.7.3 (04/21/2017) – BSAPI 3.11.0

* [#4661] Remove old models for STT and KWS
* [#4662] Fixed SPE 3.7.2 contains wrong version of BSAPI that may cause some errors

Phonexia Speech Engine 3.7.2 (03/27/2017) – BSAPI 3.11.0

* [#4579] Fixed registering VAD stream returns HTTP code 500 if realtime workers limit exceeded
* [#2807] RTP streams now support payload 0 (PCMU) and 8 (PCMA)
* [#4536] Added new configuration option “stream.http.timeout”
* [#4588] Update BSAPI to 3.11.0
* [#4529] Added French stream KWS
* [#4305] Added new model STT DE_DE 3.0.0
* [#4565] Added nonspeech segment to VAD output
* [#4531] Fixed STT SK_TELCO returns empty transcription
* [#4513] Fixed STT FR transcription of second channel was shifted
* [#4543] Fixed KWS Pashto needs Dutch data
* [#4378] Fixed STT ES_AMER1 may returns empty transcription
* [#4377] Updated models STT RU_RU, RU_RU_FIN, RU_RU_A to 4.0.0
* [#4306] Updated models STT CS_CZ, CS_CZ_FIN, CS_CZ_ENERGY, CS_CZ_TELCO, CS_CZ_IT to 4.0.0
* [#4305] Updated KWS DE_DE model to version 3.0.0
* [#4377] Updated KWS RU_RU model to version 4.0.0
* [#4306] Updated KWS CS_CZ model to version 3.0.0


== SPE v3.6.x ==

Phonexia Speech Engine 3.6.5 (03/22/2017) – BSAPI 3.10.2

* [#4586] All benchmark requests without optional parameter “path” ends with error

Phonexia Speech Engine 3.6.4 (03/10/2017) – BSAPI 3.10.2

* [#4516] Processing file with SID with huge calibration set may take a long time

Phonexia Speech Engine 3.6.3 (02/23/2017) – BSAPI 3.10.2

* [#4363] Fixed stream may be deleted by garbage collector immediately after creation
* [#4404] Fixed Utils and Benchmarks may cause resource lock error
* [#4498] Update BSAPI to 3.10.2
* [#4322] Fixed Time analysis extractor sometimes crash
* [#4333, #4347] Fixed STT EN 4.0.0 and NL_NL 4.0.0 returns <s>, <sil/> and “silence” segments
* Fixed stream KWS EN configuration

Phonexia Speech Engine 3.6.2 (01/05/2017) – BSAPI 3.10.1

* [#4338] Fixed error handling when using websockets

Phonexia Speech Engine 3.6.1 (12/14/2016) – BSAPI 3.10.1

* [#4290] Fixed unable to remove HTTP stream if stream was configured to store data to a file and no data was sent
* [#4295] Fixed unable to find license file if path contains special characters [Windows]
* [#4145] Added VAD benchmark
* [#4146] Added SQE benchmark
* [#4148] Added keyword threshold to keyword list
* [#3797] Added stream TAE
* [#4199] Fixed websocket may not be correctly closed in some cases
* [#4216] Changed result for SQE (see API documentation)
* [#4188] CPU information in benchmark results does not contains processor codename anymore (it may be inaccurate)
* [#4150] Stream technologies VAD and KWS now supports incremental mode (query parameter “result_mode” in POST /technologies/*/stream)
* [#4313] Support for logging in separate thread (configuration parameter “server.logging.enable_async”), disabled by default
* [#4320] Renamed and updated KWS models: ITALIAN -> IT_IT, DUTCH -> NL_NL
* [#4320] Added Dictate model CZ_PROMPT
* [#4320] Added STT models: IT_IT, NL_NL (based on DNN), RU_FIN, CZ_PROMPT
* [#4320] Updated STT models: AR, CZ, CZ_ENERGY, CZ_FIN, CZ_IT, CZ_TELCO, EN (based on DNN), ZH
* [#4320] Updated KWS model ZH
* [#4320] Updated VAD model DEFAULT
* [#4332] Update BSAPI to 3.10.1
* [#4319] New default file logging destination (“log” folder) with daily file rotation and purge after 5 days
* [#4319] VBS plugin now supports log file rotation


== SPE v3.5.x ==

Phonexia Speech Engine 3.5.3 (10/25/2016) – BSAPI 3.9.1

* Fixed starting several SID tasks at the same time with newly created SID model may cause database inconsistency

Phonexia Speech Engine 3.5.2 (10/21/2016) – BSAPI 3.9.1

* Added french STT
* Fixed “is_last” flag was not properly set in results of stream technologies SID, KWS, VAD
* Fixed stream VAD used wrong configuration file, that caused the technology not work
* Fixed wrong stream VAD result name (SpeakerIdentificationStreamMultiResult -> VoiceActivityDetectionStreamResult)

Phonexia Speech Engine 3.5.1 (10/06/2016) – BSAPI 3.9.1

* Update BSAPI to 3.9.1

Phonexia Speech Engine 3.5.0 (10/04/2016) – BSAPI 3.9.0

* Added global confidence to one best result in STT
* Update BSAPI to 3.9.0


== SPE v3.4.x ==

Phonexia Speech Engine 3.4.4 (09/23/2016) – BSAPI 3.8.0

* Fixed server require old database schema (v100)
* Fixed speed up MySql database requests for file search
* Added API changes for version 3.4.x to API documentation

Phonexia Speech Engine 3.4.3 (09/20/2016) – BSAPI 3.8.0

* Fixed server returns error for KWS phoneme request (/technologies/keywordspotting/phonemes) if only KWS or Stream KWS was running

Phonexia Speech Engine 3.4.2 (09/19/2016) – BSAPI 3.8.0

* Added stream VAD (/technologies/vad/stream)
* Added stream KWS (/technologies/keywordspotting/stream)
* Added technology benchmarks for AGE, DIAR, GID, LID, SID, STT (/technologies/{TECHNOLOGY}/benchmark)
* Added request to get voice-print info (/technologies/speakerid/vpinfo)
* Added usage examples to API documentation
* Add configuration options for TCP connection settings
* Added VAD segmentation to Time Analysis technology
* Support to acquire and compare language-prints
* LID technology was separated to LIDC (comparator) and LIDE (extractor)
* Support websockets for pending operations
* Added server health check request (GET /status)
* Update BSAPI to 3.8.0


== SPE v3.3.x ==

Phonexia Speech Engine 3.3.2 (08/23/2016) – BSAPI 3.6.1

* Added configuration option to disable OPUS and FLAC files in storage

Phonexia Speech Engine 3.3.1 (08/19/2016) – BSAPI 3.6.1

* Fixed resource stay locked for some time after task is finished
* Minor fixes in documentation

Phonexia Speech Engine 3.3.0 (07/11/2016) – BSAPI 3.6.1

* Phonexia Server renamed to Phonexia Speech Engine
* Fixed some pending operations are not processed until new pending operation is created
* Fixed early access to stream SID result may cause server crash
* Fixed check if user is active during authentication process
* Fixed custom pronunciation in keyword list does not take effect
* Added parallel starting of technologies (configuration parameter ‘server.technology_multithread_initialization’) – default is disabled
* Added resource locking (configuration parameter ‘server.enable_resource_locker’) – default is enabled
* Added request POST /technologies/diarization/split to create multi-channel recording by diarization – each channel coresponds to one speaker
* Added request GET /technologies/keywordspotting/phonemes to get supported phonemes
* Added log files rotation (configuration parameters ‘server.logging.file.rotation’ and ‘server.logging.file.purge_count’)
* Added support for FLAC and OPUS files – it is possible to upload and process these files, but requests which produce new files always produces WAV files
* Added request GET /admin/roles to list user roles
* Added VBS (Voice Biometry Server) plugin
* Added result of GET /server/info contains information about plugins
* 32-bit architecture (i386) is not supported anymore
* Updated BSAPI to 3.6.1

Posted in PublicNews, Support and tagged , , , .