Skip to content Skip to main navigation Skip to footer

Search: speech%20engine

121 results

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

…to download for commercial/research purposes under a Creative Commons 4.0 license. Data originates from OXFORD VGG VoxCeleb Dataset which detailed license can be found here. SpeakerID Example Data Set v1.0 83.89 MB Download Publications: S. Chung, A. Nagrani, A. Zisserman VoxCeleb2: Deep Speaker Recognition INTERSPEECH, 2018. Nagrani, J. S. Chung, A. Zisserman VoxCeleb: a large-scale speaker identification dataset INTERSPEECH, 2017….

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

Waveform Denoiser (DENOISER)

…software cannot remove unwanted speech or music in the background. Denoiser is used to remove noise from the recording and at the same time to amplify the speech signal for: Better intelligibility when listening by people (recommended use), Achieving better results with automatic speech recognition technologies (necessary to test on customer data first). Input: audio file (format details – see…

Adding new language or technology model (Browser)

This article explains how to add a new technology model into the current Speech Engine (SPE) instance when using Phonexia Browser. Prerequisites To proceed, you need to have existing installation of SPE. If you do not have one, check other articles, especially: Download Speech Platform Installation of Phonexia Browser Documentation of Phonexia Browser Installation package with new language models Note:…

Video – Speech Analytics technologies

MODULE 4: Speech Analytics technologies (23 min) Common generic rules for CLI, REST and GUI Speech To Text (STT) in CLI, REST and GUI Keyword Spotting (KWS) in CLI, REST and GUI Phoneme Recognizer (PHNREC) in CLI, REST and GUI Time Analysis Extraction (TAE) in CLI, REST and GUI Summary https://www.youtube.com/watch?v=-FAoRywqv7U…

Input audio quality

Quality of the audio is extremely important for satisfactory results of any speech processing technology, being it simple voice activity detection, speech transcription, voice biometry, or other. There are two main aspects of audio quality: technical quality of the audio data (format, codec, bitrate, SNR, …) sound quality of the actual content (background noise, reverberations, …) Technical quality Using inappropriate…

SPE and Browser installation: embedded SPE

In this post, we break down the complexities of the initial installation process. By the end of the guide, you will be able to start processing your recordings with Phonexia Speech Technologies. 1. Download Evaluation package Download the Phonexia Evaluation package from https://partner.phonexia.com/kb/sp/speech-platform/evaluation-package/ Create a new directory and unzip the package into it in your desired location, for ex. C:/Phonexia/…

Understand SPE configuration

…text-based, well commented and human readable. Read carefully these comments as there are some useful tips and tricks hidden inside. Let’s begin; pay attention to the comment about variables notation format mentioned in the configuration preamble: # This is the default properties file for Phonexia Speech Engine # # Variables: # ${application.dir} path to application directory # ${system.env.<NAME>} system environment…

Voice Activity Detection (VAD)

Voice Activity Detection is a language-, domain- and channel-independent technology that identifies parts of audio recordings with speech content vs. non-speech content. It creates labels for speech and other signals in the recording; this can then serve as a decision point whether to process the recording by other technologies or not. VAD is usually part of rapid filtration process in…

SID: Speaker Identification: Results Enhancement

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system…

Speech To Text / Keyword Spotting supported languages

Languages supported by Speech To Text and Keyword Spotting Standard = Maintained until newer generation is released, or end of support is reached. Language generation is specified by the number in “Model name”. Language (region) Model name Released End of support Maintenance Arabic (Gulf, Kuwait) AR_KW_6 2022-04 8th gen. Standard Arabic (Levantine) AR_XL_6 2021-05 8th gen. Standard AR_XL_5 2020-08 7th…

Support Lifecycle Policy (PSP)

General Lifecycle of Phonexia products is driven by Phonexia Product Support and Lifecycle Policy (valid from Q3/2019). Content of our support and software versioning approach is defined as well in this document. Specific versions of our products and languages are supported and maintained according to following tables. Phonexia Speech Engine Version Release Date End of Support Maintained Until Release type…

Recommended OS and HW (PSP)

Recommended operating systems Windows 64-bit – Windows Server 2019 (*), latest version of Windows 10 (*) Linux 64-bit – latest version of RHEL/CentOS 7 (*) Compatible Operating Systems (**) : 64-bit Windows 8.1, Windows Server 2016, and newer 64-bit Linux with glibc >= 2.17, e.g. Ubuntu 20.04, Mint 19.3, RHEL/CentOS 8.2, … (*) Speech Platform components (e.g. Speech Engine) are…