Search Results for: score transformation

Results 11 - 20 of 25 Page 2 of 3
Results per-page: 10 | 20 | 50 | 100

Browser3 – Releases and Changelogs

Relevance: 1%      Posted on: 2020-07-24

Phonexia Browser v3 (Browser3) is developed as client on top of Phonexia Speech Engine v3. Phonexia Browser is a successor of Phonexia Speech Intelligence Resolver v1 (SIR1). This page lists changes in Browser releases. Releases Changelogs Phonexia Browser v3.31.2, BSAPI 3.31.0 - Jul 24 2020 Non-public Feature Preview release Fixed: STT result version mismatch Phonexia Browser v3.31.1, BSAPI 3.31.0 - Jul 08 2020 Non-public Feature Preview release New: Browser now requires CentOS 7 or other Linux based OS with glibc >= 2.17 Version 3.31.0 was skipped Phonexia Browser v3.30.8, BSAPI 3.30.8 - Jun 29 2020 Public release Fixed: SID Evaluator…

Speech Quality Estimator – Essential

Relevance: 1%      Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…

Time Analysis (TAE)

Relevance: 1%      Posted on: 2017-05-18

Technology description Technology Time Analysis Extraction by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels.  This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one…

Language Identification (LID)

Relevance: 1%      Posted on: 2020-07-09

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Phonexia uses state-of-the-art language identification (LID) technology based on iVectors that were introduced by NIST (National Institute of Standards and Technology, USA) during the 2010 evaluations. The technology is independent on text and channel. This highly accurate technology uses the power of voice biometrics to automatically recognize spoken language. Application areas Preselecting multilingual sources and routing audio streams/files to language dependent…

Keyword Spotting

Relevance: 1%      Posted on: 2019-06-03

Phonexia Keyword Spotting (KWS) identifies occurrences of keywords and/or keyphrases in audio recordings. It can help you to get valuable information from huge quantities of speech recordings. You only need to specify the keywords or phrases you wish to find. This technology identifies all recordings with keyword occurrences and allows you to automatically route important recordings or calls to your experts. Typical use cases Call centers increase operator and supervisor efficiency by searching calls identify inappropriate expressions from operators check marketing campaigns with automatic script-compliance control Mass media and web search servers index and search multimedia by keyword route multimedia…

Voice Activity Detection – Essential

Relevance: 1%      Posted on: 2018-04-04

Phonexia Voice Activity Detection (VAD) identifies parts of audio recordings with speech content vs. nonspeech content. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (speech vs. nonspeech segments) Segmentation The section Segmentation describes the results of VAD, which are segments of detected voice and silence. Segments are…

VIN – Releases and Changelogs

Relevance: 1%      Posted on: 2018-04-08

Phonexia Voice Inspector (VIN) is developed as a desktop application for forensic speaker comparison. This page lists changes in VIN releases. Releases Changelogs Voice Inspector v4.0.0, BSAPI 3.23.0 - Dec 11 2019 - VIN is available with L4 technology model - Other technology models (S2, L2, L3, XL3) are no longer supported - Added Diarization Technology (available in waveform editor) - Population Sets structure changed - Reworked dialog for population set management - Added possibility to set type of estimation of the Target distribution - Using population set to estimate Target distribution allows 1:1 comparison - Bug fixes Voice Inspector…

Time Analysis

Relevance: 1%      Posted on: 2018-04-15

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels. This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one channel. channel…

How to convert STT confusion network results to one-best

Relevance: 1%      Posted on: 2020-04-06

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and word alternatives: The recommended algorithm for converting Confusion Network (CN) to One-best is as follows: loop through all CN timeslots from start to end in each timeslot, get the input alternative with highest score and if it's not <null/> or…

Speech Quality Estimation

Relevance: 0%      Posted on: 2018-04-02

Speech Quality Estimation (SQE) is a language-, domain- and channel-independent technology that quantifies the quality of an audio recording. 2 most important statistics used in the calculation of the SQE score are SNR (signal-to-noise ratio) and the bitrate of the recording. SQE is usually part of the rapid filtration process in deployments. SQE also measures over 20 other properties of the recording, all of which can be found in the output file and further processed. See description in SPE documentation. Typical use cases are: verification of recording quality on the input, searching based on quality of the recording, noise of…