Skip to content Skip to main navigation Skip to footer

Search: STT_STREAM

19 results

Video – Getting started with SPE

MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY…

Waveform Denoiser (DENOISER)

…Speech Engine documentation); stream not supported, technology model name to be used for processing. Output: audio file (WAV or RAW), together with xml/json report (in SPE only). Fig.: Comparison of original recording (david_noisy.wav, top half of image) and same recording processed by Denoiser (david_denoised.wav, bottom half of the image). Typical Questions Q: What do you recommend for deploying this technology?…

Phonexia technologies introduction

…technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis Extraction (TAE) 14:22 Speech Platform architecture; Speech Engine, Phonexia Browser, Phonexia Voice Inspector brief 18:52 HW and SW requirements, typical deployment topologies 21:34 Supported file- and stream formats, typical implementations and data flows 27:29 Licensing technical options 32:24 Summary, recommended next steps   https://youtu.be/DDu0Y1rgQ6k…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Understand SPE database

…unregistering files after processing (if using the files registering technique instead of uploading the audio files – see the Understanding SPE home directory article). This makes the files information AND the cached processing results to be kept in database. Or, you may be saving stream data to file, but not deleting the created stream audio files using the REST API…

Key Features (PSP)

…in the Languages Available section. Speech To Text (STT) and Keyword Spotting (KWS) languages Language Identification (LID) languages Supported Audio input The Speech Engine server supports various audio formats as listed in API reference > Audio requirements. It also supports the RTP/HTTP stream processing as listed in API reference > RTP/HTTP streams. The Speech Engine allows the usage of some…

Phonexia Speech Engine

Phonexia Speech Engine (SPE) is main part of Phonexia Speech Platform. SPE is a server application for 64-bit Linux or Windows, providing REST API to entire portfolio of Phonexia speech technologies. SPE capabilities overview: Audio files and stream processing Audio files RTP / HTTP streams Speaker Identification (SID) ✓ ✓ Speech To Text (STT) ✓ ✓ Keyword Spotting (KWS) ✓…

What is User configuration file and how to use it

…sentence. So, following the How to configure STT realtime stream word detection parameters article, we create a stt_cs_cz_5_online.bs.usr text file along the original stt_cs_cz_5_online.bs configuration file in <SPE directory>/bsapi/stt/settings directory and put the following lines in it (changing the forward extension parameter from default 750 to 1500): [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] forward_extensions_length_ms=1500 Then after restarting SPE – and optionally checking in SPE log…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

STT: Results explained

…outputs The outputs can contain the following special tokens: Token (5th STT generation and newer) Token (legacy STT generations) Meaning <segment> <s> start of utterance </segment> </s> end of utterance <silence/> _SILENCE_ or <sil/> silent part (or no speech detected) <null/> _DELETE_ time slot should not go to one-best output Realtime stream processing output modes NOTE: Only single-channel (mono) audio…

STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature. The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes: specify the actual preferred phrases (in the phrases part) specify words to be added to STT language model (in the dictionary part) Each part can be used independently,…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

Understand SPE executable files

…SID4C (SID4 extractor and SID4 comparator) with both L4 and XL4 models, depending on actual availability of the technologies/models in that SPE installation. Due to the “…single character” pattern definition, the list won’t include SID4E_STREAM, SID4C_STREAM and SID4CALIB technologies. phxadmin2: example 3 ./phxadmin2 technology enable sid?_stream:*l?=3 sid4?_stream:*l?=1 enable 3 instances of technologies with names matching “sid followed by single character,…

STT: What is Preferred Phrases feature and how to use it

…to use preferred phrases containing such ‘unknown words’, it’s necessary to add these words to the language model first, using LMC – see STT Language Model Customization tutorial then perform the transcription using the customized STT model, specifying the preferred phrases in the POST /technologies/stt or POST /technologies/stt/input_stream REST call. Note: The REST call body does not allow specifying custom…