Skip to content Skip to main navigation Skip to footer

Search: time unit

9 results

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

Understand SPE directory structure

…on which technologies are included in the particular SPE installation. For testing and first-time evaluation we usually include the full set of technologies, other installations may contain only limited subset. Location of bsapi directory can be modified using bsapi.path option in SPE configuration file. This might be useful in complex network infrastructure, for sharing technologies between multiple SPEs, and similar…

KWS: Results explained

…the detected pronunciation. Start- and end time is in HTK units. 1 HTK unit is 100 nanoseconds, so dividing the times by 10000 gives the amount of milliseconds. Score is log likelihood ratio from {-inf,+inf} interval. Confidence is a probability from {0,1} interval. To convert it to percentage, multiply the confidence value by 100. Example This example of Keyword Spotting…

Release Notes

…which can be edited by users. Speech Engine: Speaker Identification (SID4) New “floating window” feature for realtime stream processing (since 3.60.0) This new floating_window parameter allows to identify speaker or extract voiceprint from only last X seconds (default 5) of speech in the realtime stream… as opposed to using speech from entire stream audio without using this parameter. Speech Engine:…

Understand SPE database

…configuration (users, roles, etc.). SQLite database updates are also handled automatically by SPE – from time to time, as we add new features or improve existing functionality, the database internal structure may get updated in newer SPE versions. When using SQLite, if new SPE version detects that database needs an update, it’s done fully automatically behind the scenes. [1] If…

General

…gathering more NET speech by calling the endpoint multiple times. Each time, the amount of NET speech is returned. There is a hard limit of 60 seconds of NET speech after which the voiceprint cannot be improved and it is advised to end the enrollment process.   WebHooks Enrollment process using WebHooks is described here. Audio Recordings Requirements for audio…

STT: Results explained

…}, { “time_slot” : 19, “start_time” : 16850000, “end_time” : 17550000, “word” : “at”, “posterior_probability” : 0.000000000000003186612474256685, “channel” : }, { “time_slot” : 19, “start_time” : 16850000, “end_time” : 17550000, “word” : “to”, “posterior_probability” : 6.2535191806661645e-24, “channel” : }, { “time_slot” : 20, “start_time” : 17550000, “end_time” : 17850000, “word” : “time“, “posterior_probability” : 0.9984032848949292, “channel” : }, { “time_slot“…