Skip to content Skip to main navigation Skip to footer

Search: cti audio stream

77 results

Arabic dialects in Phonexia LID and STT

…a bit, but you won’t understand Moroccan Data acquisition AUDIO (used for LID and STT training) MSA is used in formal speaking situations such as sermons, lectures, news broadcasts, and speeches so it is pretty difficult/impossible to find recordings of spontaneous phone conversations in MSA available MSA recordings are usually from broadcasting (microphone) or rather formal scripted speeches (also microphone)…

Key Features (VIN)

…Speaker Identification, Speaker Diarization, Phoneme Recognizer, Voice Activity Detection, Speech Quality Estimation A search for repetitive sound patterns across all recordings in audio due to the automatic phonemic transcription Input: Questioned recordings (a minimum of 1 recording) Suspected speaker recordings (a minimum of 1 recording) The Population set (a technical minimum of 10 speakers, and a recommended minimum of 50…

Q: What are the requirements for SID evaluation dataset?

…in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution) audio files should be mono, lin16 format, 8 kHz+ sample rate *Note: splitting single recording into multiple shorter…

SPE and Browser installation: embedded SPE

…multimedia converter By default, the Speech Engine will accept only a limited list of audio formats. In order to process the non-native formats, install a multimedia converter. The recommended SW for this is FFmpeg. FFmpeg on Windows Download the latest version from https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip After unzipping the package, move the ffmpeg.exe executable to the /SPE/ directory. You can delete the rest…

Q: While trying to install SPE3, I get the error for loading libasound.so.2 libraries

Currently I’m trying to install the provided binaries for Linux, but I do get the following when running phxadmin: ./phxadmin: error while loading shared libraries: libasound.so.2: cannot open shared object file: No such file or directory I’m trying to run this under CentOS 7. A: Please install the right libraries required for manipulation with audio files from official repository into…

Q: How can we test Phonexia technologies?

We can prepare a testing package for you with full functionality of all technologies. The license validity is 90 days to allow you to test the technologies. Note: by default a NET license is provided for testing. This license needs to have active Internet connection to a phonexia licensing server in order to function. Rest assured no data – audio,…

Download Speech Platform

…(DIAR) – model XL4 Language Identification (LID) – model L4 Gender Identification (GID) – model XL5 Age Estimation (AGE) ) – model XL5 Voice Activity Detection (VAD) – model GENERIC_3 and SID4_XL5 Speech Quality Estimation (SQE) Time Analysis Extraction (TAE) Waveform Denoiser (DENOISER) Phonexia Browser example audio (in ./BROWSER/example/ and ./SPE/bsapi/{technology}/example/) Step #2 – First start To get started, please…

Voice Activity Detection (VAD)

Voice Activity Detection is a language-, domain- and channel-independent technology that identifies parts of audio recordings with speech content vs. non-speech content. It creates labels for speech and other signals in the recording; this can then serve as a decision point whether to process the recording by other technologies or not. VAD is usually part of rapid filtration process in…

Phonexia End User License Agreement

…downloading any necessary materials or software, even if Phonexia has been advised of the possibility of such damages. 4.2 Phonexia recognizes and agrees that the Client remains the sole owner of the title to any data provided to Phonexia while using the Web demo license, including audio recordings, transcripts, personal information, or any intellectual property rights contained therein (the “Provided…

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio

SID4 performance on Intel® Xeon® Platinum 8124M

…enforcement agencies might use different methods gathering recording, but the principle is very similar. Based on data measured on data set described above we can see this conclusion for Intel® Xeon® Platinum 8124M: Phonexia SID4 using L4 model can perform up to 180 FTRT using 1 physical CPU core when processing audio data containing 44% of speech Optimal system performance…