MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY…
Search: VAD
24 results
MODULE 4: Speech Analytics technologies (23 min) Common generic rules for CLI, REST and GUI Speech To Text (STT) in CLI, REST and GUI Keyword Spotting (KWS) in CLI, REST and GUI Phoneme Recognizer (PHNREC) in CLI, REST and GUI Time Analysis Extraction (TAE) in CLI, REST and GUI Summary https://www.youtube.com/watch?v=-FAoRywqv7U…
MODULE 3: Voice Biometrics technologies (23 min) Common generic rules for CLI, REST and GUI Speaker Identification (SID) in CLI, REST and GUI Language Identification (LID) in CLI, REST and GUI Gender Identification (GID) in CLI, REST and GUI Summary https://www.youtube.com/watch?v=AyEoPfYVel8…
…XL5 Diarization (DIAR) – model XL4 Language Identification (LID) – model L4 Gender Identification (GID) – model XL5 Age Estimation (AGE) ) – model XL5 Voice Activity Detection (VAD) – model GENERIC_3 and SID4_XL5 Speech Quality Estimation (SQE) Time Analysis Extraction (TAE) Waveform Denoiser (DENOISER) Phonexia Browser example audio (in ./BROWSER/example/ and ./SPE/bsapi/{technology}/example/) Step #2 – First start To get…
…sentence. So, following the How to configure STT realtime stream word detection parameters article, we create a stt_cs_cz_5_online.bs.usr text file along the original stt_cs_cz_5_online.bs configuration file in <SPE directory>/bsapi/stt/settings directory and put the following lines in it (changing the forward extension parameter from default 750 to 1500): [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] forward_extensions_length_ms=1500 Then after restarting SPE – and optionally checking in SPE log…
…models VIN application (graphical user interface, GUI) with the following technologies in-build Speaker Identification (SID4_XL5) Speaker Diarization (DIAR) Voice Activity Detection (VAD) Speech Quality Estimator (SQE) Phoneme Recogniser (PHNREC) example population sets and audio (in ./examples/) and example report templates (in ./templates/) Hardware requirements minimum – CPU: Intel® Core™ i5, RAM: 4 GB, Required HDD space: 0.5 GB for software…
…✓ Voice Activity Detection (VAD) ✓ ✓ Time Analysis Extraction (TAE) ✓ ✓ Speech Quality Estimation (SQE) ✓ ✓ Language Identification (LID) ✓ Gender Identification (GID) ✓ Age Estimation (AGE) ✓ Speaker Diarization (DIAR) ✓ Results caching Processing results can be optionally stored in results cache database to speed up eventual re-processing of the same recordings by the same technology…
…and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender Identification (GID) Speech Analytics technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis…
…recording, Speech to Text (STT) – several languages supported – converts speech into plain text (words or sentences) automatically, Keyword Spotting (KWS) – several languages supported – detects specific keywords/phrases automatically without conversion to text, Gender identification (GID) – identifies whether a speaker is male or female, Age Estimation (AGE) – estimates the speaker´s age group, Voice Activity Detection (VAD)…
…technologies setup. If we assume that the whole machine is dedicated as a “speech computing unit” then, in general, we can calculate it as follows: file: phxspe.properties server.n_workers = <#_of_core> file: technologies.xml (no. of threads per technology, can be also set up by the phxadmin tool) SQE: <#_of_cores>/4 VAD: <#_of_cores>/2 other technologies: <#_of_cores> RAM: 8 cores = 32 GB 16…
…physical server, configure your technologies.xml to the following number of instances: SQE: <#_cpu_cores>/4 VAD: <#_cpu_cores>/2 any other technology: <#_cpu_cores> (Note: your license should also be configured properly. Ask our Sales department for cooperation in case of hot-load evaluation tests. The production license will be configured with our assistance, of course) Optimal RAM recommendation: 4 cores: 16 GB RAM 8 cores:…
One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…
…technology model, results JSON data rest_result_tae TAE processing results – file, used technology model, results JSON data rest_result_vad VAD processing results – file, used technology model, results JSON data SPE logging to database Storing SPE logs to database is available only for MariaDB / MySQL. This is mainly for performance reasons – SQLite is not designed for high concurrency, i.e….
MODULE 2: Filtering and supporting technologies (22 min) Common generic rules for CLI, REST and GUI Filtering, sorting, pre-/post-processing overview Speech Quality Estimation (SQE) in CLI, REST and GUI Voice Activity Detection (VAD) in CLI, REST and GUI Diarization (DIAR) in CLI, REST and GUI Age Estimation (AGE) in CLI, REST and GUI Denoiser (DENOISER) in CLI, REST and GUI…
…{SPE_installation_directory} ├── bsapi │ ├── age │ │ ├── data │ │ ├── example . . └── settings . . . . │ └── vad │ ├── data │ ├── example │ └── settings ├── data │ ├── benchmark │ └── database │ ├── MariaDB │ ├── SQLite │ └── MySQL – obsolete ├── doc ├── EULA ├── external │…