Search Results for: task

Results 1 - 13 of 13Page 1 of 1
Results per-page: 10 | 20 | 50 | 100

Speech Engine configuration file explained

Relevance: 100%      Posted on: 2021-02-19

In this article we explain details of the Speech Engine configuration file phxspe.properties, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, a default configuration filephxspe.properties is created in the settings directory. The file is loaded during SPE startup, i.e. you need to restart SPE to apply any changes made in the file. If Speech Engine is used together with Phonexia Browser in so-called "embedded" mode (see details about "embedded SPE" mode in Browser…

SPE3 – Releases and Changelogs

Relevance: 53%      Posted on: 2021-04-16

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). Releases Changelogs Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16) Public release Fixed: 6th generation STT/KWS stream result may start with words from end of previous stream Fixed: Some licensing error messages are not shown in log Fixed: Missing file names in log messages in SID and SID4 tasks Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used Fixed: phxdamin2…

SPE configuration

Relevance: 12%      Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Time Analysis

Relevance: 6%      Posted on: 2018-04-15

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels. This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one channel. channel…

How do you calculate SNR in Speech Quality Estimation?

Relevance: 6%      Posted on: 2019-07-01

Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise has Gaussian distribution. So we can estimate the SNR by looking at the frequency distribution in individual frames. This approach to SNR estimation is based on the article by Kim Chanwoo, and Richard M. Stern, called "Robust Signal-to-Noise Ratio Estimation Based…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

Relevance: 6%      Posted on: 2019-10-30

Faster Than Real Time (FTRT) is metrics developed for defining software performance reference point. Using this metric you can collect "benchmark" data of real processing speed for reviewed software, which should be found - and reproduced - on exactly defined HW. Then, comparing various benchmarks result, you can compare performance of the specified software and its parts on different HW configurations. And vice versa - using the same metric you can compare software from different vendors on the same HW configuration and for the same processing task. We are recognizing two measurable metrics: Recording based FTRT is calculated from real…

Speech Engine 3.35.1

Relevance: 6%      Posted on: 2020-10-13

Speech Engine 3.35.1, DB v1600, BSAPI 3.35.1 (2020-10-13) Fixed Missing input stream task name in log messages Missing arguments in "word not found" error messages (when using preferred phrases)

Speech Engine 3.35.3

Relevance: 6%      Posted on: 2020-11-24

Speech Engine 3.35.3, DB v1601, BSAPI 3.35.3 (2020-11-24) New Internal support for SAMPA phonetic alphabet Updated STT model RU_RU_A to version 4.5.0 of (updated language model) Updated STT/KWS/PHNREC model AR_XL to version 5.2.0 (updated language model, changed phonemes notation to X-SAMPA) Fixed Cannot create new output stream due to hanging unfinished tasks Task is not removed from pool when result is delivered via Webhook Some log messages contain format placeholder instead of numbers Missing <silence/> label in STT confusion network output STT confusion network contains <silence/> tags with confidence greater than 1.0 Diarization crashes during processing Diarization XL4 crashes on…

Terminology

Relevance: 6%      Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Time Analysis (TAE)

Relevance: 6%      Posted on: 2017-05-18

Technology description Technology Time Analysis Extraction by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels.  This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one…

Q: Please describe how to get the results for a pending operation.

Relevance: 6%      Posted on: 2017-06-27

A: If server responds on pending request by status 200 - OK,  the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on pending request by status 202 - Accepted, server will create task and server will begin to process the file. In response HTTP header (in parameter "Location") there is path for pending resource. In the body there is a ID of pending operation. Polling: Client asks on the pending resource (e.g. “get /pending/{ID}). Server will answer with…

Speaker Diarization

Relevance: 6%      Posted on: 2018-04-02

Speaker Diarization labels segments of the same voice(s) in one mono channel audio record based by the individual speaker´s voice. It is a language-, domain- and channel-independent technology. It performs not only the segmentation of speakers, but of technical signals and silence as well. The outputs of the technology can be both log file with labels and/or split audio files/one new multichannel audio file. The correct speaker diarization is still research task nowadays. Typical use cases: Preprocessing for other speech recognition technologies, labeling the parts of the utterance according to the speakers, splitting telephone conversation recorded in mono into several…

Voice Activity Detection – Essential

Relevance: 6%      Posted on: 2018-04-04

Phonexia Voice Activity Detection (VAD) identifies parts of audio recordings with speech content vs. nonspeech content. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (speech vs. nonspeech segments) Segmentation The section Segmentation describes the results of VAD, which are segments of detected voice and silence. Segments are…