Search: STT results

20 results

Releases and Changelogs (SPE)

…results content, all STT results will be removed from cache (database) during update! Speech Engine 3.32 Speech Engine 3.32.0, DB v1500, BSAPI 3.32.0 (2020-08-28) New: Added support for Webhooks and WebSockets in stream processing New: Added support for preferred phrases in 5th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream) New: Added possibility to get multiple STT result types…

Release Notes

…5th generation (RU_RU_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 90,8 % (up to 7.1 p.p. improvement). Polish (Poland) new-generation PL_PL_6 model: It is an upgrade of previous 5th generation (PL_PL_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 85.3 % (up to 18.7 p.p. improvement). Italian (Italy) new-generation IT_IT_6 model: It is an upgrade…

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model…

Arabic dialects in Phonexia LID and STT

…code ar-XL, where the XL means “cross-Levantine” 😉 NOTE: To get the best STT results, use the model that corresponds to given dialect. The AR_XL_* model is best suitable for Levantine dialect recordings. When using AR_XL_* model for neighbor dialect, e.g. Iraqi, the results will be much worse… and for e.g. Maghrebi, the results will be most probably completely unusable….

Understand SPE database

…results JSON data rest_result_gid GID processing results – file, used technology model, results JSON data rest_result_kws KWS processing results – file, used technology model, used keyword list, results JSON data rest_result_lid LID processing results – file, used technology model, used language pack, results JSON data rest_result_phnrec PHNREC processing results – file, used technology model, results JSON data rest_result_sid SID processing…

STT: Results explained

…transcription is started using result_mode=incremental parameter in the request. In this mode, each request for transcription results returns only changes since the last request for results. In incremental mode, the received results may correct results received previously, e.g. when one request was sent in a middle of a word, the next request contains a correction, i.e. the correct entire word….

What is User configuration file and how to use it

…sentence. So, following the How to configure STT realtime stream word detection parameters article, we create a stt_cs_cz_5_online.bs.usr text file along the original stt_cs_cz_5_online.bs configuration file in <SPE directory>/bsapi/stt/settings directory and put the following lines in it (changing the forward extension parameter from default 750 to 1500): [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] forward_extensions_length_ms=1500 Then after restarting SPE – and optionally checking in SPE log…

Releases and Changelogs (Browser)

…column in Results pane New: Added “Minimum confidence to display” setting for Keyword Spotting in Settings dialog -> Scoring tab (affects number of hits displayed in Results pane) Phonexia Browser 3.51 Phonexia Browser 3.51.0, BSAPI 3.51.0 (2022-06-14) New: Compatibility with SPE 3.51 (e.g. option to set number of workers automatically) Changed: Show also numbers for AGE results Phonexia Browser 3.50…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)? A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition….

Phonexia Speech Engine

…✓ Voice Activity Detection (VAD) ✓ ✓ Time Analysis Extraction (TAE) ✓ ✓ Speech Quality Estimation (SQE) ✓ ✓ Language Identification (LID) ✓ Gender Identification (GID) ✓ Age Estimation (AGE) ✓ Speaker Diarization (DIAR) ✓ Results caching Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology…

FAQs (Browser)

…details, see KWS technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages are supported by STT? A: Please see List of supported STT Languages. For more details, see STT technology documentation. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…,…