Skip to content Skip to main navigation Skip to footer

Search: process

63 results

Key Features (PSP)

Phonexia Speech Platform is provided as a set of several components: The Speech Engine (SPE) component is a REST API that includes technologies for the automated processing of audio files and audio streams. This component is usually provided in a specific configuration that meets the customer’s use case. The Phonexia Browser component is an expert-level application (on the top of…

Understand SPE executable files

…(in octal format, e.g. 027) pidfile=<path> – Write the application’s process ID (PID) to the specified file Windows-specific registerService – Register the application as Windows service displayName=<text> – Specify service friendly name (valid only with registerService) description=<text> – Specify service description (valid only with registerService) startup=automatic|manual – Specify service startup mode (valid only with registerService) unregisterService – Unregister the previously…

Keyword Spotting (KWS)

…more keywords and optional threshold value and/or pronunciation variants of the keyword. The number of keywords and pronunciations is not limited. – to clarify this statement: The performance drop affects only processing using keyword lists without explicitly defined pronunciations. In such cases, the technology must create pronunciations internally in the background before starting the processing (see Pronunciations section below), which…

STT: What is Words-To-Numbers feature and how to use it

…point zero three ⇒ 1586.03 sixty four million seven hundred thousand ninety ⇒ 64700090 This should help to simplify processing of the transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, e.g. in voicebot applications. Where is the converted output available? The words to numbers conversion is available only in n-best output (i.e. where the entire sentence…

FAQs (PSP)

…In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: What languages do you offer? It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT)…

Q: How do I get results for a pending operation?

A: If server responds on pending request by status 200 – OK, the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on pending request by status 202 – Accepted, server will create task and server will begin to…

Speech Engine update

…technology models configuration usually introduces new features or major fixes, which may change communication between server and client, or other changes which may affect customer processes can also include new technology models; with such update you can add only the new technology, without SPE installation Upgrade changes the first version number (e.g. x.y.z to x+1) and is a major change…

Video – Getting started with SPE

MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY…

Waveform Denoiser (DENOISER)

…Speech Engine documentation); stream not supported, technology model name to be used for processing. Output: audio file (WAV or RAW), together with xml/json report (in SPE only). Fig.: Comparison of original recording (david_noisy.wav, top half of image) and same recording processed by Denoiser (david_denoised.wav, bottom half of the image). Typical Questions Q: What do you recommend for deploying this technology?…

Phoneme Recogniser (PHNREC)

Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme…

Understand SPE technologies configuration file

…technologies.xml file containing the following setup: STT (Speech To Text) with 8 instances of SK_SK_5 model STT_STREAM (Speech To Text for stream processing) with 2 instances of CS_CZ_6 model SID4E (Speaker Identification 4 Voiceprint Extractor) with 2 instances of L4 model 3 instances of XL4 model SID4C (Speaker Identification 4 Voiceprint Comparator) with 2 instances of L4 model 3 instances…

Understand SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles. In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer,…

Time Analysis Extraction (TAE)

…dialogue. This can be used to improve calls between operators and callers or to indicate potential stress points in phone calls, for example, change of speech speed during the conversation). Input TAE can process both audio files and streams (for format details see Speech Engine documentation). By its nature, TAE is usable mainly on two channel phone calls recordings, where…

Input audio quality

Quality of the audio is extremely important for satisfactory results of any speech processing technology, being it simple voice activity detection, speech transcription, voice biometry, or other. There are two main aspects of audio quality: technical quality of the audio data (format, codec, bitrate, SNR, …) sound quality of the actual content (background noise, reverberations, …) Technical quality Using inappropriate…