Skip to content Skip to main navigation Skip to footer

Search: STT results

21 results

Video – Getting started with SPE

MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY…

Waveform Denoiser (DENOISER)

…software cannot remove unwanted speech or music in the background. Denoiser is used to remove noise from the recording and at the same time to amplify the speech signal for: Better intelligibility when listening by people (recommended use), Achieving better results with automatic speech recognition technologies (necessary to test on customer data first). Input: audio file (format details – see…

Understand SPE multithreaded technologies initialization

…of single-threaded initialization is that it may take longer time to fully initialize the whole system, depending on the actual technologies configuration (number of initialized technologies and instances). In multi-threaded configuration, instances of each technology are initialized in multiple parallel threads, one separate thread for each technology–model combination. This, in general, results in faster initialization of the whole system. On…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Download Semantic Search demo

…/home/data/document_0003.txt /home/data/document_0004.txt Each document may also have metadata associated with it, these are textual and specified after the space symbol in <document_list> Example content of document list with metadata: /home/data/meeting_ashari_bago.txt STT transcript of meeting between bosses Ashari and Bago from 17.3.2021 /home/data/doc_twitter.1234.txt Twitter posts related to eventful event In this case metadata (e.g. STT transcript of meeting between bosses Ashari…

Orbis 1.4.0 Release Notes

…displayed items in network map. Solution: If the current time range is too wide to display all relations, you can narrow it down to show all results. Only telephony recordings and assets are visualized on a graph. Limited duration of input audio file Orbis editions without STT is 240 minutes and editions with STT 120 minutes. Limited maximum allowed upload…

Understand SPE executable files

…See POST /audiofile endpoint documentation for details. phxclient: example 2 phxclient /login=admin /password=phonexia /method=GET /uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” ./phxclient –login=admin –password=phonexia –method=GET –uri=”127.0.0.1:8600/technologies/stt/?path=/myfile.wav&model=en_us_6&result_type=one_best,n_best&cache_disable=true” Process myfile.wav file stored in the root of SPE internal storage – e.g. uploaded using the previous example – using the Speech To Text (STT) technology model EN_US_6 (6th generation English), returning one_best and n_best result types, and disabling any…

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)? A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition….

Phonexia Speech Engine

…✓ Voice Activity Detection (VAD) ✓ ✓ Time Analysis Extraction (TAE) ✓ ✓ Speech Quality Estimation (SQE) ✓ ✓ Language Identification (LID) ✓ Gender Identification (GID) ✓ Age Estimation (AGE) ✓ Speaker Diarization (DIAR) ✓ Results caching Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

STT: What is Words-To-Numbers feature and how to use it

…numbers conversion is based on set of grammar rules, describing how the conversion should work. Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example: in Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm in Spanish 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm Can it be extended or tuned? You can edit…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

What is User configuration file and how to use it

…sentence. So, following the How to configure STT realtime stream word detection parameters article, we create a stt_cs_cz_5_online.bs.usr text file along the original stt_cs_cz_5_online.bs configuration file in <SPE directory>/bsapi/stt/settings directory and put the following lines in it (changing the forward extension parameter from default 750 to 1500): [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] forward_extensions_length_ms=1500 Then after restarting SPE – and optionally checking in SPE log…

Releases and Changelogs (Browser)

…column in Results pane New: Added “Minimum confidence to display” setting for Keyword Spotting in Settings dialog -> Scoring tab (affects number of hits displayed in Results pane) Phonexia Browser 3.51 Phonexia Browser 3.51.0, BSAPI 3.51.0 (2022-06-14) New: Compatibility with SPE 3.51 (e.g. option to set number of workers automatically) Changed: Show also numbers for AGE results Phonexia Browser 3.50…