Search: STT%20vs.%20STT_STREAM

58 results

Releases and Changelogs (SPE)

…results content, all STT results will be removed from cache (database) during update! Speech Engine 3.32 Speech Engine 3.32.0, DB v1500, BSAPI 3.32.0 (2020-08-28) New: Added support for Webhooks and WebSockets in stream processing New: Added support for preferred phrases in 5th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream) New: Added possibility to get multiple STT result types…

Understand SPE configuration file

…server.bsapi_comparator_fa_cache_size Runtime server.enable_authentication_token server.enable_resource_locker server.upload_max_filesize server.max_metadata_size server.tcp.queue server.tcp.threads server.cors_enable Tasks server.n_workers, server.n_realtime_workers server.n_task_limit server.task_priorities_enable server.task_default_priority server.finished_task_timeout Streams stream.http.enable input_stream.http.timeout stream.websocket.enable input_stream.websocket.max_payload_size stream.rtp.enable stream.rtp.bind_ip stream.rtp.min_port, stream.rtp.max_port input_stream.rtp.stream_limit input_stream.rtp.timeout output_stream.rtp.timeout Audio formats server.audio_formats.opus.enabled server.audio_formats.flac.enabled audio_converter.enabled audio_converter.command Reporting reporting.urls reporting.ssl.enabled reporting.ssl.ca_file reporting.ssl.certificate_file reporting.ssl.private_key_file reporting.ssl.private_key_password reporting.ssl.cipher_list External external.technologies.tts_connectors Generic settings server.bind_ip, server.port # IP address and port for server listening server.bind_ip = 0.0.0.0 server.port…

Release Notes

…5th generation (RU_RU_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 90,8 % (up to 7.1 p.p. improvement). Polish (Poland) new-generation PL_PL_6 model: It is an upgrade of previous 5th generation (PL_PL_5) of STT/KWS. STT word accuracy (WAcc) is increased up to 85.3 % (up to 18.7 p.p. improvement). Italian (Italy) new-generation IT_IT_6 model: It is an upgrade…

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model…

Understand SPE technologies configuration file

…SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

STT: What is Preferred Phrases feature and how to use it

…to use preferred phrases containing such ‘unknown words’, it’s necessary to add these words to the language model first, using LMC – see STT Language Model Customization tutorial then perform the transcription using the customized STT model, specifying the preferred phrases in the POST /technologies/stt or POST /technologies/stt/input_stream REST call. Note: The REST call body does not allow specifying custom…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

Understand SPE executable files

…SID4C (SID4 extractor and SID4 comparator) with both L4 and XL4 models, depending on actual availability of the technologies/models in that SPE installation. Due to the “…single character” pattern definition, the list won’t include SID4E_STREAM, SID4C_STREAM and SID4CALIB technologies. phxadmin2: example 3 ./phxadmin2 technology enable sid?_stream:*l?=3 sid4?_stream:*l?=1 enable 3 instances of technologies with names matching “sid followed by single character,…

STT: Results explained

…outputs The outputs can contain the following special tokens: Token (5th STT generation and newer) Token (legacy STT generations) Meaning <segment> <s> start of utterance </segment> </s> end of utterance <silence/> _SILENCE_ or <sil/> silent part (or no speech detected) <null/> _DELETE_ time slot should not go to one-best output Realtime stream processing output modes NOTE: Only single-channel (mono) audio…

Understand SPE workers configuration

…CPU cores in the server. Example: Czech STT on stream is approx. 4 times faster than realtime, i.e. 1 CPU core can process 4 realtime streams simultaneously. So a server with 8 CPU cores running only STT stream can be configured as follows: keep 1 core dedicated for operating system and SPE remaining 7 cores can handle 28 realtime workers…

Arabic dialects in Phonexia LID and STT

…TEXT (used for STT language model training) MSA is used in all formal writing such as official correspondence, literature, newspapers, webpages so there is no problem to accumulate loads of texts, but it will be more formal and far from spontaneous speech Support for MSA in Phonexia products Name LID L4 STT Description Arabic (MSA) arb — Modern Standard Arabic,…

Understand SPE configuration

…timeout for HTTP stream in seconds. # If stream doesn’t receive any data for given time, then stream is closed. stream.http.timeout = 30 # Enable RTP stream subsystem stream.rtp.enable = true # IP address for create rtp sessions stream.rtp.bind_ip = 0.0.0.0 # Sets starting port for creating RTP sessions stream.rtp.min_port = 10000 stream.rtp.max_port = 11000 # Number of max opened…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

SPE and Browser installation: standalone SPE

…Quality Estimation Stream [disabled] 17) Speech To Text [disabled] 18) Speech To Text Input Stream [disabled] 19) Time Analysis [disabled] 20) Time Analysis Stream [disabled] 21) Voice Activity Detection [disabled] 22) Voice Activity Detector Stream Technology [disabled] 23) Enable all 24) Disable all 0) Quit Choose technology to configure [0]:23 Select the option to Enable all technologies (usually the option…