Search Results for: engine channels

Results 1 - 20 of 57Page 1 of 3
Results per-page: 10 | 20 | 50 | 100

SPE3 – Releases and Changelogs

Relevance: 100%      Posted on: 2021-04-16

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). Releases Changelogs Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16) Public release Fixed: 6th generation STT/KWS stream result may start with words from end of previous stream Fixed: Some licensing error messages are not shown in log Fixed: Missing file names in log messages in SID and SID4 tasks Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used Fixed: phxdamin2…

Speech Engine and technologies, instances, workers… explained

Relevance: 36%      Posted on: 2020-11-19

Configuring Speech Engine to utilize effectively the full power of underlying hardware can get challenging – one can easily get lost in all the strange terms like technologies, instances, slots, or workers... This article should shed some light in it. Speech Engine is like post office Thinking about Speech Engine, there is actually a very nice analogy with post office (or bank branch): Post office is a place providing different kinds of services – one can go there to send letters, send or pick up packages, get a POBox, get some financial services, insurance, etc.).   Speech Engine has various…

Speech Engine configuration file explained

Relevance: 35%      Posted on: 2021-02-19

In this article we explain details of the Speech Engine configuration file phxspe.properties, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, a default configuration filephxspe.properties is created in the settings directory. The file is loaded during SPE startup, i.e. you need to restart SPE to apply any changes made in the file. If Speech Engine is used together with Phonexia Browser in so-called "embedded" mode (see details about "embedded SPE" mode in Browser…

Phonexia Speech Engine

Relevance: 24%      Posted on: 2020-11-19

About Phonexia Speech Engine v3 (SPE3) is a main executive part of the Phonexia Speech Platform. It is a server application with REST API interface through which you can access all available speech technologies. Both, Linux 64bit and Windows 64bit operating systems are supported. Phonexia Speech Engine (SPE3) is adjustable server component which houses all speech technologies. SPE3 provides RESTfull application programming interface to access various technologies. Aside from technologies themselves the SPE has implemented other various functionality supporting work with speech technologies, recordings and streams, and others. Features Main purpose of SPE is to work as processing unit for…

Speech Engine 3.35.0

Relevance: 23%      Posted on: 2020-10-01

Speech Engine 3.35.0, DB v1600, BSAPI 3.35.0 (2020-10-01) New LID model L4 was promoted to production (LID BETA_L4 renamed to LID L4) Added new language tag documentation (doc/Technology_LID_L4_Language_tags.pdf) Updated STT model CS_CZ_5 to version 5.2.1 (fixes faulty transcription of numbers into Roman format) Added configurable STT Confusion Network threshold (in technology configuration file) Fixed STT didn't work with 4th and older generation models after introduction of the Preferred phrases feature in SPE 3.32 Update from SPE 3.30 causes errors in STT result cache memory leak in logging system Typo in name of es-XA language in LID model L4 default language…

How to configure Speech Engine workers

Relevance: 22%      Posted on: 2020-03-28

Worker is a working thread performing the actual files- or realtime streams processing in Speech Engine. This article helps to understand the Speech Engine workers and provides information how to configure workers for optimal performance and server utilization. The default workers configuration in settings/phxspe.properties is as shown below – 8 workers for files processing and 8 workers for realtime streams processing. These numbers mean the maximum number of simultaneously running tasks. # Multithread settings server.n_workers = 8 server.n_realtime_workers = 8 Requests for additional file processing tasks are put in a queue and processed according their order and priorities. Requests for…

Speech Engine 3.35.1

Relevance: 21%      Posted on: 2020-10-13

Speech Engine 3.35.1, DB v1600, BSAPI 3.35.1 (2020-10-13) Fixed Missing input stream task name in log messages Missing arguments in "word not found" error messages (when using preferred phrases)

Speech Engine 3.35.2

Relevance: 21%      Posted on: 2020-10-22

Speech Engine 3.35.2, DB v1600, BSAPI 3.35.2 (2020-10-22) Fixed Detection of certain USB license tokens

Speech Engine 3.35.3

Relevance: 21%      Posted on: 2020-11-24

Speech Engine 3.35.3, DB v1601, BSAPI 3.35.3 (2020-11-24) New Internal support for SAMPA phonetic alphabet Updated STT model RU_RU_A to version 4.5.0 of (updated language model) Updated STT/KWS/PHNREC model AR_XL to version 5.2.0 (updated language model, changed phonemes notation to X-SAMPA) Fixed Cannot create new output stream due to hanging unfinished tasks Task is not removed from pool when result is delivered via Webhook Some log messages contain format placeholder instead of numbers Missing <silence/> label in STT confusion network output STT confusion network contains <silence/> tags with confidence greater than 1.0 Diarization crashes during processing Diarization XL4 crashes on…

Speech Engine 3.35.4

Relevance: 21%      Posted on: 2020-12-14

Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14) Fixed STT/KWS model AR_XL_5 has incorrect name and does not start Missing KWS model AR_XL_5 Processing of some short recordings causes TwoGmmCalibThreshold is not finite error STT preferred phrases "out of vocabulary" (OOV) warning message is now more verbose

Phonexia Speech Engine EoL

Relevance: 20%      Posted on: 2018-06-19

Information about release dates, support and maintenance periods of Phonexia Speech Engine (software End of Life - EoL).

LID adaptation

Relevance: 15%      Posted on: 2021-03-02

This article describes various ways of Language Identification adaptation. Basic terminology Languageprint (*.lp file) – numeric representation of the audio, extracted from audio file for language identification purpose of (similar to “voiceprint”, but representing the spoken language, not the speaking person) Languageprint archive (*.lpa file) – multiple languageprints combined into single archive Creation of languageprint archives is not supported by SPE, these are supported as input only.   Language model – digital characteristics of a specific language Language model can be trained from languageprints (*.lp), language prints archives (*.lpa), or from combination of both. LID language model should not be…

STT Language Model Customization tutorial

Relevance: 9%      Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform…

Time Analysis (TAE)

Relevance: 7%      Posted on: 2017-05-18

Technology description Technology Time Analysis Extraction by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels.  This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one…

Time Analysis

Relevance: 7%      Posted on: 2018-04-15

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels. This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one channel. channel…

Difference between on-the-fly and off-line type of transcription (STT)

Relevance: 6%      Posted on: 2017-12-11

Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition. The dictate engine, also known as on-the-fly transciption, does not look to the future and has information about just a few seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can't predict what will come in next seconds of the speech. When access to the whole recording is granted during off-line transcription…

Phonexia Speech Platform

Relevance: 6%      Posted on: 2017-05-18

  Phonexia Speech Platform (Speech Platform) provides partners a complete portfolio of speech technologies with an easy-to-use design. The platform allows users to design and deploy a wide range of speech processing systems in a short time and without extensive knowledge of the technologies background. Products On top of Speech Platform, several products provided: for commercial market Phonexia Speech Analytics Phonexia Voice Biometrics for government market Phonexia Speech Analytics GOV Phonexia Voice Biometrics GOV Characteristics Completeness – all speech technologies in one place Simple to use – RESTfull API for rapid development Modularity – build your own specific process workflow…

Speech Quality Estimator – Essential

Relevance: 4%      Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…

SPE configuration

Relevance: 3%      Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Keyword Spotting

Relevance: 3%      Posted on: 2019-06-03

Phonexia Keyword Spotting (KWS) identifies occurrences of keywords and/or keyphrases in audio recordings. It can help you to get valuable information from huge quantities of speech recordings. You only need to specify the keywords or phrases you wish to find. This technology identifies all recordings with keyword occurrences and allows you to automatically route important recordings or calls to your experts. Typical use cases Call centers increase operator and supervisor efficiency by searching calls identify inappropriate expressions from operators check marketing campaigns with automatic script-compliance control Mass media and web search servers index and search multimedia by keyword route multimedia…