Search Results for: confusion network

Results 1 - 10 of 18 Page 1 of 2
Results per-page: 10 | 20 | 50 | 100

How to convert STT confusion network results to one-best

Relevance: 100%      Posted on: 2020-04-06

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and word alternatives: The recommended algorithm for converting Confusion Network (CN) to One-best is as follows: loop through all CN timeslots from start to end in each timeslot, get the input alternative with highest score and if it's not <null/> or…

SPE3 – Releases and Changelogs

Relevance: 43%      Posted on: 2020-12-14

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14) Public release Fixed: STT/KWS model AR_XL_5 has incorrect name and does not start Fixed: Missing KWS model AR_XL_5 Fixed: Processing of some short recordings causes TwoGmmCalibThreshold is not finite error Fixed: STT preferred phrases "out of vocabulary" (OOV) warning message is now more verbose Speech Engine 3.36.0, DB v1601, BSAPI 3.35.3 (2020-12-01) Non-public Feature…

Speech To Text results explained

Relevance: 16%      Posted on: 2019-05-27

This article aims on giving more details about Speech To Text outputs and hints on how to tailor Speech To Text to suit best your needs. In the process of transcribing speech, the Speech To Text technology usually identifies multiple alternatives for individual speech segments, as multiple phrases can have similar pronunciations, possibly with different word boundaries, e.g. “eight tea machines” vs. “eighty machines”. The technology provides various output types which show only single or multiple transcription alternatives. For processing realtime streams, two result modes are supported – one mode provides complete transcription, second mode provides incremental results. Output types…

SPE configuration

Relevance: 6%      Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Software Vetting (Best Practice)

Relevance: 5%      Posted on: 2017-06-15

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Software Vetting

Relevance: 5%      Posted on: 2018-04-06

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Terminology

Relevance: 4%      Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Browser3 – Releases and Changelogs

Relevance: 4%      Posted on: 2020-10-23

Phonexia Browser v3 (Browser3) is developed as client on top of Phonexia Speech Engine v3. Phonexia Browser is a successor of Phonexia Speech Intelligence Resolver v1 (SIR1). This page lists changes in Browser releases. Releases Changelogs Phonexia Browser v3.35.2, BSAPI 3.35.2 - Oct 21 2020 Public release Fixed: Speaker identification dialog in WaveEditor which did not work for SID4 Fixed detection of certain USB license tokens Phonexia Browser v3.35.0, BSAPI 3.35.0 - Oct 02 2020 Public release New: Compatibility with SPE 3.35 Phonexia Browser v3.30.12, BSAPI 3.30.11 - Aug 20 2020 Public release Fixed: Transcription results intermittently displays words in wrong…

What are STT preferred phrases and how to use them

Relevance: 3%      Posted on: 2020-11-26

Speech Engine version 3.32 and later includes new STT feature called Preferred phrases. This article explains what is the feature good for, how does it work internally and gives some tips for practical implementation. What are preferred phrases In the speech transcription tasks, there may be situations where similar sounding words get confused, e.g. "WiFi" vs. "HiFi", "route" vs. "root", "cell" vs. "sell", etc. Normally, the language model part of the Speech To Text does its job here and in the context of longer phrase or entire sentence prefers the correct word:  ×    I'm going to cell my car. Hmmm, such…

Licensing (technical details)

Relevance: 3%      Posted on: 2018-03-02

This document describes all licensing types for Phonexia product licensing available to our partners and customers. Each partner/customer can choose the licensing variant which best fits the current project or infrastructure. The document does not describe business conditions of Phonexia licensing. What is the License? The License is a formal agreement regarding “The Product Usage Rights” between Phonexia s.r.o. and a user of any Phonexia technology or Phonexia product. Licenses are issued by the Business Department for all speech technologies and products, and may be required in order to use utilities and tools developed by Phonexia or partners. For technical…