Skip to content Skip to main navigation Skip to footer

Search: Audio%20Source%20Profile

72 results

Understand SPE configuration file

…support is enabled or disabled. By default it’s enabled. server.audio_formats.flac.enabled # Enable or disable native support for FLAC audio format (Default: true) server.audio_formats.flac.enabled = true Controls whether the FLAC audio format support is enabled or disabled. By default it’s enabled. audio_converter.enabled # Enable or disable audio converter audio_converter.enabled = false Controls whether support for automatic audio format conversion is enabled…

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: How to fix Error 1007: Unsupported audio format? Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? . But if you need use as input audio recordings in other formats, you can configure SPE…

Understand SPE audio converter

…tool, you can upload essentially any audio– or even videofile to SPE and it will be automatically converted to audio format supported natively by SPE. ⓘ NOTE: The automatic conversion is done only when uploading audiofiles to SPE, it’s not done when registering files! For more info about uploading/registering audiofiles, see Understanding SPE home directory article. Converter installation As a…

Releases and Changelogs (SPE)

…[G#157] Added endpoint for updating existing Audio Source Profile [G#160] SID4 calibration technology renamed: SID4CALIBSET -> SID4CALIB [G#161] Mean normalization support in Audio Source Profiles [G#169] Added cache for Audio Source Profiles, see server.audio_source_profiles_cache_size property [G#170] Added False Acceptance Calibration cache, see server.bsapi_comparator_fa_cache_size [G#149] Fixed: phxclient prints help if running without parameters [G#150] Fixed: UTF-8 symbols are not escaped in…

Input audio quality

audio codec, heavy compression, too low bitrate, etc. can damage or even completely destroy essential parts of the audio signal required by speech technologies. Commonly used audio compressions make use of perceptual limitation of human hearing and can remove frequencies which are covered by other frequencies, etc… Therefore, to get satisfactory results from speech technologies, use appropriate audio format. ⓘ…

LID: Terminology and adaptation

…20 hours of audio is required, see requirements below Enhancing existing language model by adding your own audio files to existing built-in language at least 5 hours of audio is required, see requirements below Creating custom language pack consisting of your chosen set of languages, both pre-trained or created from your audio files Audio recordings requirements Format: WAV, FLAC, RAW…

FAQs (Browser)

audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find…

Release Notes

…and fixes Speech Engine: General Reduced RAM consumption (since 3.58.0) RAM consumption can be up to several gigabytes lower, depending on technologies configuration and processed audio. This is mainly visible in Speech To Text when processing many audios or longer audios (or both). The effect may be less visible in other technologies. Fixed issues with non-ASCII / Unicode file names…

Q: How to fix Error 1007: Unsupported audio format?

Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is…

Understand SPE configuration

…FLAC audio format (Default: true) server.audio_formats.flac.enabled = true # Enable or disable audio converter audio_converter.enabled = true # Set converter command # %1 is for input file # %2 is for output file # ffmpeg example: # audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2 audio_converter.command = ffmpeg -loglevel warning…

SID: Speaker Identification: Results Enhancement

…is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender, etc. Technically, an Audio Source Profile is an entity that contains all information required for any system calibration or result…

Understand SPE benchmark

…the audio, and the amount of actual speech in the audio affect the processing speed… because the the non-speech parts are stripped from the audio before processing. The processing speed is then calculated as follows: FtRT = sum_of_speech_lengths_in_all_recordings ÷ sum_of_processing_times_of_all_recordings When using the option with your specified file, only that single recording is used… so, to account for various audio

Understand SPE database

SPE database serves multiple purposes: stores SPE internal data stores various information about SPE entities created by SPE user audio files metadata speaker models and their voiceprints speaker groups and their voiceprints calibration sets keyword lists language packs audio source profiles stores cached processing results (ON by default, can be set in SPE configuration file) optionally also stores SPE log…

Understand SPE connectors for external TTS

…little-endian mono audio data. In SPE 3.46 and newer, the audio sampling frequency must be set to the naturalSampleRateHertz value provided in the TTS service capabilities information. In SPE 3.45 and older, the audio sampling frequency must be fixed to 8000 Hz. SPE then reads the audio and writes it either to a file, or to an output realtime stream,…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…computing performance is better by ~17% compared with Intel® Xeon® E5 2860 v4 FtRTaudio shows that real requirements for HW and its computing power are approx. 62% lower than traditional approach using FtRTnet_speech for audio dataset with similar ratio between speech and non-speech (silence) and it is proven by measuring it. Best practices Use FtRTaudio when calculating hardware sizing and…