Skip to content Skip to main navigation Skip to footer

Search: STT Stream

19 results

Releases and Changelogs (SPE)

…results content, all STT results will be removed from cache (database) during update! Speech Engine 3.32 Speech Engine 3.32.0, DB v1500, BSAPI 3.32.0 (2020-08-28) New: Added support for Webhooks and WebSockets in stream processing New: Added support for preferred phrases in 5th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream) New: Added possibility to get multiple STT result types…

Release Notes

…5th generation (RU_RU_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 90,8 % (up to 7.1 p.p. improvement). Polish (Poland) new-generation PL_PL_6 model: It is an upgrade of previous 5th generation (PL_PL_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 85.3 % (up to 18.7 p.p. improvement). Italian (Italy) new-generation IT_IT_6 model: It is an upgrade…

Understand SPE technologies configuration file

SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

Understand SPE workers configuration

…CPU cores in the server. Example: Czech STT on stream is approx. 4 times faster than realtime, i.e. 1 CPU core can process 4 realtime streams simultaneously. So a server with 8 CPU cores running only STT stream can be configured as follows: keep 1 core dedicated for operating system and SPE remaining 7 cores can handle 28 realtime workers…

STT: What is Preferred Phrases feature and how to use it

…to use preferred phrases containing such ‘unknown words’, it’s necessary to add these words to the language model first, using LMC – see STT Language Model Customization tutorial then perform the transcription using the customized STT model, specifying the preferred phrases in the POST /technologies/stt or POST /technologies/stt/input_stream REST call. Note: The REST call body does not allow specifying custom…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

Understand SPE executable files

…SID4C (SID4 extractor and SID4 comparator) with both L4 and XL4 models, depending on actual availability of the technologies/models in that SPE installation. Due to the “…single character” pattern definition, the list won’t include SID4E_STREAM, SID4C_STREAM and SID4CALIB technologies. phxadmin2: example 3 ./phxadmin2 technology enable sid?_stream:*l?=3 sid4?_stream:*l?=1 enable 3 instances of technologies with names matching “sid followed by single character,…

STT: Results explained

…outputs The outputs can contain the following special tokens: Token (5th STT generation and newer) Token (legacy STT generations) Meaning <segment> <s> start of utterance </segment> </s> end of utterance <silence/> _SILENCE_ or <sil/> silent part (or no speech detected) <null/> _DELETE_ time slot should not go to one-best output Realtime stream processing output modes NOTE: Only single-channel (mono) audio…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

What is User configuration file and how to use it

…sentence. So, following the How to configure STT realtime stream word detection parameters article, we create a stt_cs_cz_5_online.bs.usr text file along the original stt_cs_cz_5_online.bs configuration file in <SPE directory>/bsapi/stt/settings directory and put the following lines in it (changing the forward extension parameter from default 750 to 1500): [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] forward_extensions_length_ms=1500 Then after restarting SPE – and optionally checking in SPE log…

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform. SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies. SPE capabilities overview: Audio files and stream processing Audio files RTP / HTTP streams Speaker Identification (SID) ✓ ✓ Speech To Text (STT) ✓ ✓ Keyword Spotting (KWS) ✓…

Key Features (PSP)

…in the Languages Available section. Speech To Text (STT) and Keyword Spotting (KWS) languages Language Identification (LID) languages Supported Audio input The Speech Engine server supports various audio formats as listed in API reference > Audio requirements. It also supports the RTP/HTTP stream processing as listed in API reference > RTP/HTTP streams. The Speech Engine allows the usage of some…

Understand SPE database

…unregistering files after processing (if using the files registering technique instead of uploading the audio files – see the Understanding SPE home directory article). This makes the files information AND the cached processing results to be kept in database. Or, you may be saving stream data to file, but not deleting the created stream audio files using the REST API…