Skip to content Skip to main navigation Skip to footer

Search: Wav

36 results

Download Speech Platform

…XL5 Diarization (DIAR) – model XL4 Language Identification (LID) – model L4 Gender Identification (GID) – model XL5 Age Estimation (AGE) ) – model XL5 Voice Activity Detection (VAD) – model GENERIC_3 and SID4_XL5 Speech Quality Estimation (SQE) Time Analysis Extraction (TAE) Waveform Denoiser (DENOISER) Phonexia Browser example audio (in ./BROWSER/example/ and ./SPE/bsapi/{technology}/example/) Step #2 – First start To get…

Q: How to fix Error 1007: Unsupported audio format?

…%2 is for output file ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2 Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the…

Phoneme Recogniser (PHNREC)

…Input: „Hi, this it Lewis.“ (WAV file containing speech) Output: sil hh ay dh ow s ih s l uw uw th sil (plain-text or xml/json output) Note: The outputs can contain the following special tokens: sil silent part (or no speech detected) The list of phonemes is available in the document phonemes_for_stt_and_kws.pdf (delivered as part of manuals in SPE…

STT: Language Model Customization tutorial

…STT model, put its name in the model parameter, like this: GET /technologies/stt?path=foobar.wav&model=<customized_model_name> Using customized STT model in command line STT To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter. For example, assuming that original pl_pl_5 model was customized, specifying updated as the model…

Get better support

…Phonexia’s hw-gen for generating basic HW print Windows 64bit http://download.phonexia.com/utils/hw-gen64.exe GNU/Linux 64bit http://download.phonexia.com/utils/hw-gen64 System Information Windows OS: msinfo32.exe Linux OS: sudo lshw -short or similar utility The database used and the version The versions of technology models (data) and product (SPE build, SAL build) WAV file, iVector, or technology output file that might be causing the failure (if possible)  …

Q: What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

When running SPE, the following error occurs: [Error] ApplicationStartup: Unhandled exception: BsapiException: SWaveformSegmenterI(/mnt/phxspe/home/phx/storage/dfs/a1cabcf7-c761-49f1 -a9bc-0a8209a09fd9.opus Requested segment (78056, 102056) is out of waveform range (0,91840). A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox…

Q: What are the recommendations for LID adaptation set?

…pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)…

STT: Adding words to language model on the fly

…using the input example shown above. The added parts are highlighted. { “result”: { “version”: 5, “name”: “SpeechRecognitionResult”, “file”: “/test.wav“, “model”: “EN_US_6”, . . . “phrases”: [ { “phrase”: “this is preferred phrase” }, { “phrase”: “and some other phrase” } ], “dictionary”: [ { “word”: “preferred”, “pronunciations”: [ { “phonemes”: “p r ih f er d”, “out_of_vocabulary”: false, “class”:…

Phonexia technologies introduction

…and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender Identification (GID) Speech Analytics technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis…

Key Features (PSP)

…– detects the audio part that contains voice, Speech Quality Estimation (SQE) – measures the quality of speech, Phoneme Recognizer (PHNREC) – several languages supported – converts speech into phonemes (written characters representing pronunciation), Waveform Denoiser (DENOISER) – automatically improves the audibility of speech for human listeners. Supported Languages The LID, STT and KWS technologies support various languages as listed…

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Understand SPE configuration

…for use with the SPE using 3rd party tools. The settings below determine which native codecs will be enabled and how the SPE should handle other audio formats. # Enable or disable native support for OPUS audio format (Default: true) # When disabled, audio file will be converted to WAV server.audio_formats.opus.enabled = true # Enable or disable native support for…

Input audio quality

…TIP: Tools like MediaInfo can easily give you technical information about your audio files. ? DO’S ? DON’TS Set your PBX, media server or recording device to one of these formats (in the order of preferrence): uncompressed WAV (16-bit, 8 kHz or more) A-law or μ-law (8-bit, 8 kHz) in WAV lossless formats like FLAC OPUS format (lossy, but developed…

Q: How do you calculate SNR in Speech Quality Estimation?

…frequencies in the waveform of speech has Gamma distribution. In contrast, noise has Gaussian distribution. So we can estimate the SNR by looking at the frequency distribution in individual frames. This approach to SNR estimation is based on the article by Kim Chanwoo, and Richard M. Stern, called “Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis”, Interspeech 2008….