Search Results for: open audio stream failed

Results 1 - 10 of 63 Page 1 of 7
Results per-page: 10 | 20 | 50 | 100

SPE3 – Releases and Changelogs

Relevance: 100%      Posted on: 2020-09-12

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs Speech Engine 3.30.13 (09/11/2020) - DB v1401, BSAPI 3.30.13 Public release New: Updated STT and KWS model AR_XL to version 5.1.0 Speech Engine 3.32.0 (08/28/2020) - DB v1500, BSAPI 3.32.0 Non-public Feature Preview release New: Added support for Webhooks and WebSockets in stream processing New: Added support for preferred phrases in 5th generation of STT (see POST /technologies/stt or POST /technologies/stt/input_stream) New:…

Error 1007: Unsupported audio format

Relevance: 47%      Posted on: 2018-12-10

Phonexia Browser application may return error "1007: Unsupported audio format" during uploading audio file. Please consider if your audio files are in . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find your distribution package at Then continue as described below: Using Phonexia Browser with embed SPE Open the Browser configuration dialog by click on button "Settings" located in tool ribbon. Select tab "Speech Engine" and configure SPE as described…

Supported audio formats

Relevance: 38%      Posted on: 2018-12-10

Supported audio format are: WAVE (*.wav) container including any of: unsigned 8-bit PCM (u8) unsigned 16-bit PCM (u16le) IEEE float 32-bit (f32le) A-law (alaw) µ-law (mulaw) ADPCM FLAC codec inside FLAC (*.flac) container OPUS codec inside OGG (*.opus) container   Other audio formats must be converted using external tools. SPE server can be configured to support automated conversion on background, see SPE configuration hints. Great tools for converting other than supported formats to supported are ffmpeg ( or SoX ( Both are multiplatform software tools for MS Windows, Linux and Apple OS X. Example of usage: ffmpeg ffmpeg -i <source_audio_file_name>…

SPE configuration

Relevance: 37%      Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of for beginners.

How to configure STT realtime stream word detection parameters

Relevance: 35%      Posted on: 2020-03-28

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part of the signal going to the decoder. Decoder is a component, which determines what a particular part of the signal contains (speech, silence, etc.). Based on that, decoder also decides whether segment has finished or not. Unlike in file processing…

Open Source Acknowledgement

Relevance: 29%      Posted on: 2018-04-06

This page collect information about Open Source code and licenses. You might be interested to ask your Phonexia contact what part of the page is relevant to your project. Phonexia Voice Verify dependencies Name  Version  License  Django  2.1.11  BSD Jinja2  2.11.2  BSD-3-Clause  MarkupSafe  1.1.1  BSD-3-Clause  Pygments  2.6.1  BSD License beautifulsoup4  4.9.1  MIT  behave  1.2.6  BSD behave-django  1.4.0  MIT  certifi  2020.6.20  MPL-2.0  chardet  3.0.4  LGPL  coreapi  2.3.3  BSD coreschema  0.0.4  BSD  defusedxml  0.6.0  PSFL  django-allauth  0.39.1  MIT  django-constance  2.7.0  BSD  django-cors-headers  3.4.0  MIT License  django-environ  0.4.5  MIT  django-extra-fields  2.0.5  Apache-2.0  django-picklefield  3.0.1  MIT  django-rest-auth  0.9.3  MIT  djangorestframework  3.9.1  BSD  docker  4.2.2 …

Voice Activity Detection – Essential

Relevance: 22%      Posted on: 2018-04-04

Phonexia Voice Activity Detection (VAD) identifies parts of audio recordings with speech content vs. nonspeech content. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (speech vs. nonspeech segments) Segmentation The section Segmentation describes the results of VAD, which are segments of detected voice and silence. Segments are…


Relevance: 19%      Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Speaker Identification: Results Enhancement

Relevance: 17%      Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…