Search: audio format

31 results

Phonexia Speech Engine

…audio manipulation SPE has built-in basic audio files manipulation functionality, like separating individual channels from stereo recordings, cut one audio to several files, save audio from incoming stream to file and others. Stream audio player To support voicebot scenarios, SPE has the ability to play audiofiles directly to output RTP stream External Text-to-speech (TTS) integration Easy integration with external TTS…

Understand SPE directory structure

…advanced configurations. bsapi ├── age ├── denoiser ├── diar ├── gid ├── kws ├── lid ├── sid4 ├── sqe ├── stt ├── tae └── vad Each individual technology directory contains typically three main subdirectories: data Technology data, in separate directories for individual technological- or language specific models example Audio files for quick testing, in some cases also in separate directories…

Understand SPE executable files

…to URL (e.g. “http://server:port”) priority=number – Set request priority (see Understanding SPE processing priority for more details) phxclient: example 1 phxclient /login=admin /password=phonexia /method=POST /uri=”127.0.0.1:8600/audiofile?path=/myfile.wav” /data=”c:\audio files\example recording.wav” Upload example recording.wav file from c:\audio files folder to SPE running at this machine (i.e. with IP address 127.0.0.1) and put it in the root of SPE internal storage under myfile.wav name….

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Releases and Changelogs (Browser)

…(phxspe.browser.log located in SPE log directory) Phonexia Browser v3.16.1, BSAPI 3.20.1 – May 17 2019 [G#112] Fixed Denoiser which created duplicate recordings under specific circumstances [G#127] Fixed comparison of SID Evaluation sets using Audio Source Profile Phonexia Browser v3.16.0, BSAPI 3.20.0 – Apr 26 2019 Support for Audio Source Profiles SID Evaluation wizard supports SID4 Phonexia Browser v3.15.0, BSAPI 3.19.1…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Speech to Text (STT)

About STT Phonexia Speech to Text (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of…

Speaker Identification (SID)

…are monitoring a large number of audio recordings or streams and we are looking for the occurrence of a specific speaker(s). Speaker spotting can be deployed for the purpose of Fraud Alert. Speaker Verification is the case when we are asking “Is this Peter Smith’s voice?”, such as when a person calls the bank and says, “Hello, this is Peter…

Keyword Spotting (KWS)

…to reveal (or “transcribe”) pronunciation directly from actual audio recording. Phoneme Recognizer Phoneme Recognizer (PHNREC) reveals the phoneme transcription of a specified audio recording, or its part. This can be used to get the actual pronunciation of a keyword or phrase as is actually spoken in the audio recording. This pronunciation can be then used in a keyword list for…

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

Understand SPE technologies configuration file

…to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc. To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin…

Speech Quality Estimation (SQE)

…channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits…

Waveform Denoiser (DENOISER)

…software cannot remove unwanted speech or music in the background. Denoiser is used to remove noise from the recording and at the same time to amplify the speech signal for: Better intelligibility when listening by people (recommended use), Achieving better results with automatic speech recognition technologies (necessary to test on customer data first). Input: audio file (format details – see…

Q: What are the requirements for SID evaluation dataset?

…in each recording (i.e. usually 2+ minutes recording length) only one speaker in each recording wide variety of gender and age is recommended recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution) audio files should be mono, lin16 format, 8 kHz+ sample rate *Note: splitting single recording into multiple shorter…

STT: Results explained

…These can be recognized by recording-level confidence value of -1. “one_best_result”: { “confidence”: -1, “segmentation”: [ … N-best output { “phrase”: “can you hear me okay i wanted to”, “channel”: 0, “score”: 509.71384, “confidence”: 0.33733934, “start”: 1500000, “end”: 28200000 } This format can be used by analytical applications to process further the alternatives. It can be also useful when…