Search: Audio%20Source%20Profile

72 results

Phonexia Speech Engine

…audio manipulation SPE has built-in basic audio files manipulation functionality, like separating individual channels from stereo recordings, cut one audio to several files, save audio from incoming stream to file and others. Stream audio player To support voicebot scenarios, SPE has the ability to play audiofiles directly to output RTP stream External Text-to-speech (TTS) integration Easy integration with external TTS…

Key Features (PSP)

…in the Languages Available section. Speech To Text (STT) and Keyword Spotting (KWS) languages Language Identification (LID) languages Supported Audio input The Speech Engine server supports various audio formats as listed in API reference > Audio requirements. It also supports the RTP/HTTP stream processing as listed in API reference > RTP/HTTP streams. The Speech Engine allows the usage of some…

Q: What are the supported audio formats?

…configured do this conversion automatically in background, see Understand SPE audio converter article. Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X. Example of usage: FFmpeg ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav This command converts any supported format/codec audio file to normalized…

Releases and Changelogs (Browser)

…(phxspe.browser.log located in SPE log directory) Phonexia Browser v3.16.1, BSAPI 3.20.1 – May 17 2019 [G#112] Fixed Denoiser which created duplicate recordings under specific circumstances [G#127] Fixed comparison of SID Evaluation sets using Audio Source Profile Phonexia Browser v3.16.0, BSAPI 3.20.0 – Apr 26 2019 Support for Audio Source Profiles SID Evaluation wizard supports SID4 Phonexia Browser v3.15.0, BSAPI 3.19.1…

Understand SPE directory structure

…advanced configurations. bsapi ├── age ├── denoiser ├── diar ├── gid ├── kws ├── lid ├── sid4 ├── sqe ├── stt ├── tae └── vad Each individual technology directory contains typically three main subdirectories: data Technology data, in separate directories for individual technological- or language specific models example Audio files for quick testing, in some cases also in separate directories…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

Q: Do the language-prints (LPs) extracted from audio sources depend on the currently available language pack?

A: The language-prints do not depend on the current language pack used. You may use them for both training a new language pack and testing/comparing against an existing language pack. The language-prints need to be compatible only with the model of LID used for language-print extraction….

Speaker Identification (SID)

…are monitoring a large number of audio recordings or streams and we are looking for the occurrence of a specific speaker(s). Speaker spotting can be deployed for the purpose of Fraud Alert. Speaker Verification is the case when we are asking “Is this Peter Smith’s voice?”, such as when a person calls the bank and says, “Hello, this is Peter…

Understand SPE user accounts

…data/ └── storage/ └── audio -> /shared_recordings/ In the above example, we created a directory audio in each SPE account’s storage, and symlinked it to a completely different directory shared_recordings. If any of the SPE accounts uploads a file to the audio directory, the file will be accessible by the other SPE accounts. NOTE: When using such trickery, all “the…

SPE and Browser installation: standalone SPE

…change the following lines to enable the FFMPEG convertor: change the line: # Enable or disable audio converter audio_converter.enabled = false to: # Enable or disable audio converter audio_converter.enabled = true 6. Start Speech Engine In order to start the Speech Engine, start the SPE executable called phxspe On Windows – type cmd in the Address bar, to open the…

Understand SPE executable files

…to URL (e.g. “http://server:port”) priority=number – Set request priority (see Understanding SPE processing priority for more details) phxclient: example 1 phxclient /login=admin /password=phonexia /method=POST /uri=”127.0.0.1:8600/audiofile?path=/myfile.wav” /data=”c:\audio files\example recording.wav” Upload example recording.wav file from c:\audio files folder to SPE running at this machine (i.e. with IP address 127.0.0.1) and put it in the root of SPE internal storage under myfile.wav name….

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

Language Identification (LID)

…Routing particular calls (languages) to human operators (language experts) Scoring and results The LID language pack defines a set of recognizable languages (represented by a language models). When identifying the language in audio recording (or languageprint), LID does the following: creates languageprint of the recording (if the input is audio recording) compares that languageprint with each language model in a…

Q: What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

…range (0,91840). A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE….

Q: What are the recommendations for LID adaptation set?

…pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)…