… S1 SID XL3 2015-11 5th gen. SID L3 2015-11 5th gen. SID L2 2012-11 4th gen. SID S2 2012-11 4th gen. SID LID L2 2012-11 4th gen. LID S2 2012-11 4th gen. LID Phonexia Browser Version Release Date End of Support Maintained Until Release type 3.60 2023-12-05 2025-06-01 n/a Public…
Search: BROWSER-D1-gui-3.16.0_EVAL-win64
81 results
Formats supported directly and natively are: WAVE (*.wav) container including any of: unsigned 8-bit PCM (u8) unsigned 16-bit PCM (u16le) IEEE float 32-bit (f32le) A-law (alaw) µ-law (mulaw) ADPCM FLAC codec inside FLAC (*.flac) container OPUS codec inside OGG (*.opus) container Other audio formats must be converted to one of those natively supported using external tools. SPE server can be…
…technologies 11:43 Speech Transcription (STT) 12:30 Keyword Spotting (KWS) 13:32 Phoneme Recognition (PHNREC) 13:54 Time Analysis Extraction (TAE) 14:22 Speech Platform architecture; Speech Engine, Phonexia Browser, Phonexia Voice Inspector brief 18:52 HW and SW requirements, typical deployment topologies 21:34 Supported file- and stream formats, typical implementations and data flows 27:29 Licensing technical options 32:24 Summary, recommended next steps https://youtu.be/DDu0Y1rgQ6k…
MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY…
…our Service Desk. in FAQ Voice Inspector Permalink Q: I am getting the error message “Your license is not for this application.” A: Check your license file (license.dat) by opening it in Notepad. Make sure the license contains records for all required modules. See Licensing article for additional information in FAQ Phonexia Browser, FAQ Speech Platform, FAQ Voice Inspector Permalink…
Application of the Code It is the policy of Phonexia, s.r.o. (“Phonexia”, “we”) to maintain the highest level of ethical standards in the conduct of our business affairs. Our values guide our actions in all cases. The actions and conduct of our officers, directors and employees (collectively, “Phonexia personnel”), as well as others acting on our behalf, are essential to…
Phonexia packages versioning follows the bugfix / update / upgrade approach. Bugfix changes only last version number (e.g. 3.45.x to x+1) and includes a fix of known problems, without changing components or technology models Update changes middle version number (e.g. 3.x.y to x+1) and changes/enhances the functionality and may change the API can include bugfixes, or changes in component or…
Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio…
This article aims on giving more details about Speech To Text outputs and hints on how to tailor Speech To Text to suit best your needs. In the process of transcribing speech, the Speech To Text technology usually identifies multiple alternatives for individual speech segments, as multiple phrases can have similar pronunciations, possibly with different word boundaries, e.g. “eight tea…
This article aims on giving more details about Keyword Spotting outputs and hints on how to tailor Keyword Spotting to suit best your needs. Scoring Keyword Spotting works by calculating likelihood ratios (LR) that at a given spot occurs a keyword or just any other speech, and comparing those two likelihood ratios. The following scheme shows Background model for anything…
One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…
Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it’s difficult to restore…
Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…
Preferred phrases is a feature, available for 5th or newer generation of STT models and Speech Engine 3.32 or later. This article explains what is the feature good for, how does it work internally and gives some tips for practical implementation. What are preferred phrases In the speech transcription tasks, there may be situations where similarly sounding words get confused,…
Arabic language has (a) one standardised variety, and (b) many non-standard varieties (dialects). In this article, our linguistic team explains differences between Modern Standard Arabic and Arabic dialects in the context of Phonexia Arabic models. Standard variety: Modern Standard Arabic (MSA) All Arabs learn it at school (not from their parents, so we cannot say it is their native variety)…