Search: Wav

32 results

Q: How do you calculate SNR in Speech Quality Estimation?

…frequencies in the waveform of speech has Gamma distribution. In contrast, noise has Gaussian distribution. So we can estimate the SNR by looking at the frequency distribution in individual frames. This approach to SNR estimation is based on the article by Kim Chanwoo, and Richard M. Stern, called “Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis”, Interspeech 2008….

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…noise, technical signals like ringing, DTMF tones, etc). This metric is useful for finding performance on actual audio data coming into audio processing pipeline. Regular recording with Voice and Silence segments in waveform Net Speech based FtRT is conservative, purely technical number. It is calculated from only spoken speech data, i.e. with all non-speech parts (silence, noise, DTMF tones, etc.)…

Understand SPE configuration file

…defined time, the stream is automatically closed. Default value is 60 seconds. Output RTP streams are used e.g. for text-to-speech output, or for audio files playback. Audio formats server.audio_formats.opus.enabled # Enable or disable native support for OPUS audio format (Default: true) # When disbled, audio file will be converted to WAV server.audio_formats.opus.enabled = true Controls whether the OPUS audio format…

Input audio quality

…TIP: Tools like MediaInfo can easily give you technical information about your audio files. ? DO’S ? DON’TS Set your PBX, media server or recording device to one of these formats (in the order of preferrence): uncompressed WAV (16-bit, 8 kHz or more) A-law or μ-law (8-bit, 8 kHz) in WAV lossless formats like FLAC OPUS format (lossy, but developed…

Get better support

…Phonexia’s hw-gen for generating basic HW print Windows 64bit http://download.phonexia.com/utils/hw-gen64.exe GNU/Linux 64bit http://download.phonexia.com/utils/hw-gen64 System Information Windows OS: msinfo32.exe Linux OS: sudo lshw -short or similar utility The database used and the version The versions of technology models (data) and product (SPE build, SAL build) WAV file, iVector, or technology output file that might be causing the failure (if possible) …

Q: What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

When running SPE, the following error occurs: [Error] ApplicationStartup: Unhandled exception: BsapiException: SWaveformSegmenterI(/mnt/phxspe/home/phx/storage/dfs/a1cabcf7-c761-49f1 -a9bc-0a8209a09fd9.opus Requested segment (78056, 102056) is out of waveform range (0,91840). A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox…

Q: What are the recommendations for LID adaptation set?

…pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)…

Download Speech Platform

…XL5 Diarization (DIAR) – model XL4 Language Identification (LID) – model L4 Gender Identification (GID) – model XL5 Age Estimation (AGE) ) – model XL5 Voice Activity Detection (VAD) – model GENERIC_3 and SID4_XL5 Speech Quality Estimation (SQE) Time Analysis Extraction (TAE) Waveform Denoiser (DENOISER) Phonexia Browser example audio (in ./BROWSER/example/ and ./SPE/bsapi/{technology}/example/) Step #2 – First start To get…

Understand SPE configuration

…for use with the SPE using 3rd party tools. The settings below determine which native codecs will be enabled and how the SPE should handle other audio formats. # Enable or disable native support for OPUS audio format (Default: true) # When disabled, audio file will be converted to WAV server.audio_formats.opus.enabled = true # Enable or disable native support for…

Open Source Acknowledgement

…Apache-2.0 WITH LLVM-exception path MIT start-server-and-test MIT url-loader MIT wavesurfer.js BSD-3-Clause webpack MIT webpack-cli MIT webpack-dev-server MIT Phonexia Commons dependencies (Java library with general functionality used across Java projects) Name License Apache Commons IO Apache License, Version 2.0 Avaitility Apache License, Version 2.0 Guava Apache License, Version 2.0 REST Assured Apache License, Version 2.0 elasticsearch-rest-high-level Apache License, Version 2.0 SnakeYAML…

Q: How to fix Error 1007: Unsupported audio format?

…%2 is for output file ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2 Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the…

Phonexia technologies introduction

…and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender Identification (GID) Speech Analytics technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis…

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Key Features (PSP)

…– detects the audio part that contains voice, Speech Quality Estimation (SQE) – measures the quality of speech, Phoneme Recognizer (PHNREC) – several languages supported – converts speech into phonemes (written characters representing pronunciation), Waveform Denoiser (DENOISER) – automatically improves the audibility of speech for human listeners. Supported Languages The LID, STT and KWS technologies support various languages as listed…