Search Results for: spe

Results 1 - 100 of 113 Page 1 of 2
Results per-page: 10 | 20 | 50 | 100

Browser3 – Releases and Changelogs

     Posted on: 2019-10-09

Phonexia Browser v3 (Browser3) is developed as client on top of Phonexia Speech Engine v3. Phonexia Browser is a successor of Phonexia Speech Intelligence Resolver v1 (SIR1). This page lists changes in Browser releases. Releases Changelogs Phonexia Browser v3.18.0, BSAPI 3.22.0 - Oct 03 2019 New: Waveform editor can now process stereo file by Diarization in per-channel mode New: Added Gender balance and Score sharpness in Settings -> Scoring New: Multiple columns in Result pane can be turned on/off at once using context menu New: Minimum speech length changed to 7 seconds Fixed: LID results information chart is not updated…

TUTORIAL: Speaker Identification – How to Do a Basic Test

     Posted on: 2019-10-08

Phonexia Speaker Identification is a voice biometry tool for recognition of speakers by their voice. In this video, we will show you how to start using this technology! You will learn how to create a "Speaker Model" to identify a speaker in a set of data. Ready to test it? Start with our video: What else is needed? 1. Phonexia Evaluation Package Evaluation package (download page) is consisting of Phonexia Browser and Phonexia Speech Engine including all necessary technologies. 2. Data We prepared the dataset for your testing. Package contains data for speaker model creation and speaker spotting too. The…

Workflow – Releases and Changelogs

     Posted on: 2019-10-07

Phonexia Workflow is a set of tools complementing Phonexia Speech Engine (SPE), which allow users to chain speech technologies into scenarios and process audio recordings automatically using these scenarios. This page lists changes in Workflow releases. Changelogs == Phonexia Workflow v1 == Phonexia Workflow 1.4.1 (10/07/2019) - SPE 3.16 - 3.17 Support for IPv4 only (since SPE does not support IPv6) Configurable application webhook address in both Workflow Runner and Data Discovery Tool This address is auto-detected when no value is supplied - default In some cases like network specific configuration it might be necessary to configure it manually Rapid…

SPE3 – Releases and Changelogs

     Posted on: 2019-10-02

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs == SPE v3.18.x == Speech Engine 3.18.2 (10/14/2019) - DB v1300, BSAPI 3.22.1 Fixed: Customized STT model fails on Windows with Request for next state but ending state reached. error message Speech Engine 3.18.1 (10/01/2019) - DB v1300, BSAPI 3.22.0 New: DICTATE technology has been renamed to STT_STREAM (/technologies/dictate -> /technologies/stt/stream) (for backward compatibility, the /technologies/dictate endpoint is internally redirected) New: SID/SID4…

Technical Training Essentials

     Posted on: 2019-09-27

Core objective: Understanding technical essentials of using Phonexia technologies and products Duration: ~94 minutes (7 + 19 + 22 + 23 + 23 min chapters) intended for product architects or developers assumes you have already watched Phonexia technologies introduction video assumes understanding of working in command line REST API principles processing JSON or XML Introduction (7 min) technologies recap CLI, REST and GUI interfaces overview https://youtu.be/xzrHyyIl01s MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY MODULE 2: Filtering and supporting…

Phonexia Workflow

     Posted on: 2019-08-06

About Phonexia Workflow is a set of tools complementing Phonexia Speech Engine (SPE), which allow users to chain speech technologies into scenarios and process audio recordings automatically using these scenarios. Scenarios are programmed using uniform API which provides an abstraction over Phonexia Speech Engine application. Provided Phonexia Workflow scenarios: SalEssentials - Speech Analytics Essentials filter out low quality audio files, provides demographic information, age estimation and speech to text processing VbsEssentials - Voice Biometrics Essentials filter out low quality audio files, provides gender identification, age estimation and speaker identification   The scenario is a tiny Java application which interacts with…

How do you calculate SNR in Speech Quality Estimation?

     Posted on: 2019-07-01

Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise has Gaussian distribution. So we can estimate the SNR by looking at the frequency distribution in individual frames. This approach to SNR estimation is based on the article by Kim Chanwoo, and Richard M. Stern, called "Robust Signal-to-Noise Ratio Estimation Based…

Voice Inspector – supporting technologies

     Posted on: 2019-06-28

Automatic Speaker Identification (SID) is the most important but not the only Phonexia technology that is implemented in Voice Inspector (VIN). Apart from SID, forensic experts, users of VIN, can benefit from automatic Signal-to-Noise Ratio calculation, Voice Activity detection, Phoneme search, and a Wave editor which incorporates the waveform, spectrum and power panel. Let's have a look on how to utilize individual technologies. Signal-to-Noise Ratio Recording quality can strongly influence the reliability of SID results and so the outcome of a forensic case. Therefore, VIN uses a module of Phonexia Speech Quality Estimation (SQE) to calculate the Signal-to-Noise Ratio (SNR)…

Voice Inspector – Interpretation of results

     Posted on: 2019-06-24

Introduction Phonexia Voice Inspector (VIN) is a tool for forensic automatic speaker identification, compliant with the Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, published by the European Network of Forensic Science Institutes.  This post explains individual SID score types and ways to visualize the results in a speaker identification case implemented in Voice Inspector. Evidence In VIN, the term evidence has two meanings. In general, it refers to any SID score that the system calculates for any pair of recordings in the case. These scores are the output of the Phonexia SID technology which runs…

Speaker Identification (SID)

     Posted on: 2019-06-13

Phonexia Speaker Identification uses the power of voice biometry to recognize speakers by their voice... i.e. to decide whether the voice in two recordings belongs to the same person or two different people. High accuracy of Speaker Identification, the Phonexia's flagship technology, has been validated in a NIST Speaker Recognition Evaluations. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker Identification is the case when we are asking "Whose voice is this?", such as in fake emergency calls.…

Keyword Spotting results explained

     Posted on: 2019-06-12

This article aims on giving more details about Keyword Spotting outputs and hints on how to tailor Keyword Spotting to suit best your needs. Scoring and results explanation Keyword Spotting works by calculating likelihoods that at a given spot occurs a keyword or just any other speech, and comparing those two likelihoods. The following scheme shows Background model for anything before the keyword (1), the Keyword model (2) and a Background model of any speech parallel with the keyword model (3). Models 2 and 3 produce two likelihoods – Lkw and Lbg (any speech = background). Raw score is calculated…

Keyword Spotting

     Posted on: 2019-06-03

Phonexia Keyword Spotting (KWS) identifies occurrences of keywords and/or keyphrases in audio recordings. It can help you to get valuable information from huge quantities of speech recordings. You only need to specify the keywords or phrases you wish to find. This technology identifies all recordings with keyword occurrences and allows you to automatically route important recordings or calls to your experts. Typical use cases Call centers increase operator and supervisor efficiency by searching calls identify inappropriate expressions from operators check marketing campaigns with automatic script-compliance control Mass media and web search servers index and search multimedia by keyword route multimedia…

Speaker Identification: Results Enhancement

     Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…

Speech To Text results explained

     Posted on: 2019-05-27

This article aims on giving more details about Speech To Text outputs and hints on how to tailor Speech To Text to suit best your needs. In the process of transcribing speech, the Speech To Text technology usually identifies multiple alternatives for individual speech segments, as multiple phrases can have similar pronunciations, possibly with different word boundaries, e.g. “eight tea machines” vs. “eighty machines”. The technology provides several types of output to show only one or more transcription alternatives. One-best output 1-best output provides transcription containing only the highest-scoring words. Each segment provides information about the transcribed word itself, the…

Speech To Text

     Posted on: 2019-05-27

Phonexia Speech To Text – also known as a voice-to-text or speech recognition – converts speech signals into plain text. After the conversion, text can be easily read, edited, searched, processed by text-based data mining tools or archived. Phonexia Speech To Text is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Typical use cases look for specific information in large call archives (e.g., claims inspection) get additional value by advanced analysis of call traffic (e.g., topic detection) maintain short reaction times by routing calls…

Language Identification (LID)

     Posted on: 2019-05-20

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Phonexia uses state-of-the-art language identification (LID) technology based on iVectors that were introduced by NIST (National Institute of Standards and Technology, USA) during the 2010 evaluations. The technology is independent on any text, language, dialect, or channel. This highly accurate technology uses the power of voice biometrics to automatically recognize spoken language. Application areas Preselecting multilingual sources and routing audio streams/files…

STT Language Model Customization tutorial

     Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform the…

Phonexia End User License Agreement

     Posted on: 2019-02-27

Please read the terms and conditions of this End User License Agreement (the “Agreement”) carefully before you use the Phonexia proprietary software providing speech solutions, technologies and accompanying services (the “Software”) delivered and marketed by Phonexia s.r.o.

Phonexia technologies introduction

     Posted on: 2019-01-25

Core objective: Basic understanding of Phonexia speech technologies and products; typical use cases, implementations and deployment topologies Duration: 35 minutes intended for idea makers and product designers assumes generic knowledge of Phonexia and speech technologies in general Content 00:00 Introduction What information can we get from speech? Overview of basic use cases Phonexia Speech Platform brief 4:21 Phonexia technologies overview and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender…

Error 1007: Unsupported audio format

     Posted on: 2018-12-10

Phonexia Browser application may return error "1007: Unsupported audio format" during uploading audio file. Please consider if your audio files are in . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org Then continue as described below: Using Phonexia Browser with embed SPE Open the Browser configuration dialog by click on button "Settings" located in tool ribbon. Select tab "Speech Engine" and configure SPE as described…

Supported audio formats

     Posted on: 2018-12-10

Supported audio format are: WAVE (*.wav) container including any of: unsigned 8-bit PCM (u8) unsigned 16-bit PCM (u16le) IEEE float 32-bit (f32le) A-law (alaw) µ-law (mulaw) ADPCM FLAC codec inside FLAC (*.flac) container OPUS codec inside OGG (*.opus) container   Other audio formats must be converted using external tools. SPE server can be configured to support automated conversion on background, see SPE configuration hints. Great tools for converting other than supported formats to supported are ffmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for MS Windows, Linux and Apple OS X. Example of usage: ffmpeg ffmpeg -i <source_audio_file_name>…

Error 1013: Unsupported: Server does not support authentication with token

     Posted on: 2018-12-10

Please check SPE subdirectory ./settings for configuration files. If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive: server.enable_authentication_token = false In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section "Basic authentication" If you would like to play with "pure" daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by phxadmin utility or can be created from ./data/phxspe.properties.default template file. Copy template file to ./settings directory Rename it to phxspe.properties Check for server.enable_authentication_token directive and setup it as…

Phonexia Voice Inspector EoL

     Posted on: 2018-07-19

Information about release dates, support and maintenance periods of Phonexia Voice Inspector.

Phonexia technology models EoL

     Posted on: 2018-07-11

Information about release dates, support and maintenance periods of Phonexia technology models.

Phonexia Speech Engine EoL

     Posted on: 2018-06-19

Information about release dates, support and maintenance periods of Phonexia Speech Engine (software End of Life - EoL).

SPE3 – Quick Start Guide

     Posted on: 2018-04-16

Do you want to run the SPE3 for the first time? This post can help you. Distribution, installation and configuration SPE is distributed by Phonexia in .zip archives. These are downloaded from Phonexia package manager using link provided by Phonexia employee. Installation is done by simple unzipping the content of the downloaded .zip archive to SPE installation folder. Configuration of SPE is done at two places. First is executable file ./phxadmin or .\phxadmin.exe serving to set file to configuration and license files configure speech technologies configure user accounts set up of few various setting Running the ./phxadmin or .\phxadmin.exe command…

Gender Identification

     Posted on: 2018-04-16

Gender Identification is a language-, domain- and channel-independent technology that uses the acoustic characteristics of the recording to determine the gender of the speaker in question. This technology is able to distinguish between two genders: Male (M) and Female (F). Minimum of speech signal for identification: 9+ sec recommended Output scoring: likelihood ratio and percentage metric (0-100%) Typical use cases: filtering calls by gender, playing advertisement focused on specific gender, getting quick demographic analysis of the recordings. The speed of Gender Identification is up to 150 FtRT (depending on the model).

SPE3 – Administration and Backup

     Posted on: 2018-04-15

Each Partner has its own administration and back up policy. Here, we highlight the most important SPE3 components to be administrated and backed up. Administration It is strongly recommended to describe your own administration approach with the following components SPE users (accounts) - Partner should maintain list of SPE users (accounts). There should be only few persons with “admin” role. All other should be with “user” role (do not see content of other “user”) and/or “vbs” role (dis/enables using of VoiceBiometry plugin) the SPE database and/or VBSplugin database administration – where the (temporary) results are stored user.home - where the…

Designing and Developing Application

     Posted on: 2018-04-15

Before designing and developing the application, we encourage Partner to find clear answer for the following questions: Customer requirements: Do my customers need file processing (audio) or stream processing in real time? What is the human power of the customer that can analyze the results? How many minutes per day or streams in parallel do my customer need to process? What are real benefits for customer (finding the needle in haystack, approaching new information, processing only few data with highest possible accuracy)? How the solution match the current processes and infrastructure of the customer? How many false alarms are acceptable…

Packages, Updates vs. Upgrades

     Posted on: 2018-04-15

Our packages follow the bug-fix /updates / upgrades approach. Some packages are distributed with limited set of speech technologies or without speech technologies. Packages Our software is distributed as ZIP file. Installation procedure is matter of unzipping archive, reconfiguration and start of software. SPE and VIN package contains speech technologies (note: SPE might contain only selected technologies).  PhxBrowser does not contain speech technologies and it needs to be combined with SPE. The software is activated by licensing file. Updates vs. Upgrades Bugfix By bugfix we understand a fix of known problem without changing components or  technology models. Bugfix changes only…

Time Analysis

     Posted on: 2018-04-15

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels. This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one channel. channel…

Age Estimation

     Posted on: 2018-04-12

Phonexia Age Estimation (AGE) estimates the age of a speaker from audio recording. The process of voiceprint extraction is similar to the extraction of SID, but as a result different features get extracted; therefore, the voiceprints extracted from AGE and SID are not mutually compatible. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling…

VIN3 – Releases and Changelogs

     Posted on: 2018-04-08

Phonexia Voice Inspector v3 (VIN) is developed as a desktop application on top of Phonexia BSAPI. This page lists changes in VIN releases. Releases Changelogs Voice Inspector v3.2.2, BSAPI 3.15.0 - Jun 5 2018 - Fixed possible application crash on Windows - Added phoneme type 'affricate' and fixed phoneme types: * phoneme 'C' changed from 'fricative' to 'affricate' * phoneme 'D' changed from 'fricative to 'plosive' * phoneme 'T' changed from 'fricative to 'plosive' * phoneme 'c' changed from 'plosive' to 'affricate' Voice Inspector v3.2.1, BSAPI 3.15.0 - Mar 16 2018 - Export of Speakers/Populations allows export only voiceprints -…

Voice Biometrics

     Posted on: 2018-04-07

Overview Phonexia Voice Biometrics is a special edition of Phonexia Speech Platform which allows you to understand the nature of audio without having to listen to it. The product helps people to utilize the power of voice biometrics to verify speaker or identify crimes. The technologies reveals automatically WHO, what GENDER, what LANGUAGE is speaking, and many other metadata. Voice Biometrics - Typical Use-Cases Use case Speaker Verification is tailored to banks/insurance companies/money lending companies and others, where is needed to confirm if caller/voice in audio file is the same person who is known to the customer. For this use…

Speech Analytics

     Posted on: 2018-04-06

Overview Phonexia Speech Analytics allows you to understand the  content of audio without having to listen to it. The results help both commercial entities and security/defense forces for immediate precise decision and response. The technologies reveal automatically WHAT content, TOPIC and KEY PHRASES are spoken, and many other metadata.   Speech Analytics - Typical Use-Cases Speech transcription is used in various application. Knowledge of content of whole call is bringing business value to the customer, comparing to listening the audio files by analytic or supervisor. Reading the text is also faster than listening the audio. Speech Analytics output is often…

Software Vetting

     Posted on: 2018-04-06

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Open Source Acknowledgement

     Posted on: 2018-04-06

This page collect information about Open Source code and licenses. You might be interested to ask your Phonexia contact what part of the page is relevant to your project. BSAPI 3 dependencies Name Version License Link type antrl 3c-3.4 BSD license static boost 1.55 Boost License static botan 1.10.9 Simplified BSD static FLAC 1.2.1 BSD license static Open Fst 1.3.4 Apache license static OpenGrm NGram 1.1.0 Apache license static ogg 1.3.2 BSD license static opus 1.1 New BSD License static libogg 1.3.2 BSD license static speex 1.2rc1 BSD license static stdlibc++, libgcc - GNU GPL with GCC Runtime Library Exception…

Voice Activity Detection – Essential

     Posted on: 2018-04-04

Phonexia Voice Activity Detection (VAD) identifies parts of audio recordings with speech content vs. nonspeech content. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (speech vs. nonspeech segments) Segmentation The section Segmentation describes the results of VAD, which are segments of detected voice and silence. Segments are…

Speech Quality Estimator – Essential

     Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…

Keyword

     Posted on: 2018-04-04

Word or a phrase that is searched by a user (defined by a user as an input for KWS technology). Phonexia does not limit the number of keywords in the keyword list. The higher number of keywords (500+) cause speed decrease.

Speaker Diarization

     Posted on: 2018-04-02

Speaker Diarization labels segments of the same voice(s) in one mono channel audio record based by the individual speaker´s voice. It is a language-, domain- and channel-independent technology. It performs not only the segmentation of speakers, but of technical signals and silence as well. The outputs of the technology can be both log file with labels and/or split audio files/one new multichannel audio file. The correct speaker diarization is still research task nowadays. Typical use cases: Preprocessing for other speech recognition technologies, labeling the parts of the utterance according to the speakers, splitting telephone conversation recorded in mono into several…

Speech Quality Estimation

     Posted on: 2018-04-02

Speech Quality Estimation is a language-, domain- and channel-independent technology that serves to quantify the quality of an audio recording. 2 most important statistics that it bases its score on are SNR (Speech-to-noise ratio) and bitrate of the recording. SQE is usually part of rapid filtration process in deployment. SQE also measures over 20 other properties of the recording, all of which can be found in the output file and further processed. See description in SPE documentation. Typical use cases are: verification of recordings' quality on the input, searching based on quality of the recording, noise of environment or speaker's…

Voice Activity Detection

     Posted on: 2018-04-02

Voice Activity Detection is a language-, domain- and channel-independent technology that identifies parts of audio recordings with speech content vs. non-speech content. It creates labels for speech and other signals in the recording; this can then serve as a decision point whether to process the recording by other technologies or not. VAD is usually part of rapid filtration process in deployment. Typical use cases are: detection of present or absent human speech for voice processing, filtering non-speech parts of the recording, filtering out recordings with not enough net speech to be processed by other technologies voice activated process, etc. The…

Product Portfolio

     Posted on: 2018-04-02

Phonexia Speech Platform is an umbrella concept for all Phonexia’s products and services related to speech technologies. It gives us the ability to customize various products to a wide range of customer needs. Platform Edition is an encapsulation of specific setup of speech technologies, modules, applications, utilities and services designed for a specific market segment. We distinguish Speech Analytics (SAL) and Voice Biometrics (VBS) as most common domain of usage. It is also a tool for marketing and sales. Voice Biometrics is focused more on identifying speaker, gender, language spoken and more. Speech Analytics focuses on gathering information about content…

Phonexia Voice Inspector v3

     Posted on: 2018-04-02

About Phonexia Voice Inspector v3 (VIN3) provides police forces and forensic experts with a highly accurate speaker identification tool during investigation of criminal matters. It uses the power of voice biometry to automatically recognize speakers by their voice. Main features of the VIN3 application: Automatic speaker identification tool to strengthen results of the standard linguistics- and phonetics-based approach Scoring in Likelihood Ratio (LR) – result from a statistical test for a comparison of two hypotheses. The system returns a number from the interval <0, +∞>, which expresses how many times more likely the data are under one hypothesis than the…

Phonexia Ethical Code

     Posted on: 2018-03-24

Application of the Code It is the policy of Phonexia, s.r.o. (“Phonexia”, “we”) to maintain the highest level of ethical standards in the conduct of our business affairs. Our values guide our actions in all cases. The actions and conduct of our officers, directors and employees (collectively, “Phonexia personnel”), as well as others acting on our behalf, are essential to maintain these standards and promote highly ethical reputation of Phonexia. To that end, all our personnel including agents, consultants and contractors as well as distribution partners involved in Phonexia´s international business activities must read, become familiar and comply with this…

Terms of Service

     Posted on: 2018-03-24

Description of the Services provided by Phonexia s.r.o. 1. Acceptance of Terms of Service (Terms as a Contract) 1.1. PHONEXIA-User Relationship. These Terms of Service (hereinafter referred to as "Agreement" or „Terms of Service“) and the PHONEXIA Privacy Policy govern the relationship between Phonexia s.r.o. (ID No.: 27680258, VAT No.: CZ27680258, registred seat at: Chaloupkova 3002/1a, 61200 Brno, registred by the County Court in Brno under file C, insert 5124), provider of the PHONEXIA technology (hereinafter referred to as "PHONEXIA") and you ("you", "your", „user“ or "Member"), and your use of and access to the website, PHONEXIA services or any…

Privacy Policy

     Posted on: 2018-03-24

Phonexia s.r.o. with registered seat at Chaloupkova 3002/1a, 612 00 Brno, Czech Republic, is a developer and provider of speech technologies software products and related services. We appreciate your visit on our websites and we are pleased that you are interested in our software products and related services. We conform our data use to the European Union’s (“EU”) General Data Protection Regulation (“GDPR”). This Privacy Policy should help you to understand how we as a data controller gather, use and protect your personal information. 1. COLLECTING PERSONAL INFORMATION When you sign up for a Phonexia Account to allow you using…

Prefiltering

     Posted on: 2018-03-23

Prefiltering is a very important part of basically any speech technology architecture. These 2 technologies are very fast and can significantly decrease the load and increase the precision of the following technologies (the exact number depends on the type of your data), thanks to sorting out the files with unacceptable quality or not enough net speech. The 2 technologies in question are Speech Quality Estimation (SQE) and Voice Activity Detection (VAD).  

Phonexia – introduction

     Posted on: 2018-03-14

What we believe in At Phonexia, we find joy in pushing the boundaries of innovation in the field of speech technology by automating and simplifying solutions for many of today’s complex communication and security-strategic challenges. By providing our partners and customers with state-of-the art speech-technology software, we leverage the power, and data, in their voices. Who we are Phonexia is the only speech technology software manufacturer that reveals and leverages the most data in speech for enterprising trailblazers across the globe who want to discover and develop powerful new skills in a knowledge-based economy. We have more than 19 years…

Licensing (technical details)

     Posted on: 2018-03-02

This document describes all licensing types for Phonexia product licensing available to our partners and customers. Each partner/customer can choose the licensing variant which best fits the current project or infrastructure. The document does not describe business conditions of Phonexia licensing. What is the License? The License is a formal agreement regarding “The Product Usage Rights” between Phonexia s.r.o. and a user of any Phonexia technology or Phonexia product. Licenses are issued by the Business Department for all speech technologies and products, and may be required in order to use utilities and tools developed by Phonexia or partners. For technical…

SPE configuration

     Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Sizing of the computing units for speech technologies

     Posted on: 2018-02-02

Best practices for good sizing of Phonexia technologies depend on a few facts: Intense work with large data sets requires good performance and bandwidth between RAM and CPU. It all depends on the size of the files with technological models data, usually loaded into RAM and used intensively for computing operations Always think only about physical cores of CPU (HT, VT features can't help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation…

Browser

     Posted on: 2018-02-01

Phonexia Browser is a powerful GUI tool for the Phonexia Speech Platform. It is designed for processing speech archives over Phonexia SPE.

VP

     Posted on: 2018-02-01

Voice Print – output from spoken speech extraction process of SID. Unique mathematical representation of the specific speaker. It is created from iVectors.

VIN

     Posted on: 2018-02-01

Voice Inspector – Phonexia GUI application for forensic analysis

STT

     Posted on: 2018-02-01

Phonexia Speech To Text, sometime also as Speech Transcription Technology (LVCSR based ASR technology)

SQE

     Posted on: 2018-02-01

Phonexia Speech Quality Estimator

SPE

     Posted on: 2018-02-01

Phonexia Speech Engine (RESTfull API)

SIR

     Posted on: 2018-02-01

Speech Intelligence Resolver – predecessor of Phonexia Browser

SID

     Posted on: 2018-02-01

Phonexia Speaker Identification, multiple generations available marked by version like SIDv2 or SIDv3

SAL

     Posted on: 2018-02-01

Phonexia Speech Analytics (formerly named SPAS)

PHR

     Posted on: 2018-02-01

Phoneme recognizer – currently part of Keyword Spotting (Phonexia Keyword Spotting - acoustics based ASR, several tec...) technology in Phonexia Speech Engine  (REST Application Program Interface)

LVCSR

     Posted on: 2018-02-01

Large Vocabulary Continuous Speech Recognition

LDC

     Posted on: 2018-02-01

Linguistic Data Consortium (Fisher English etc.) – Group of libraries, universities,… working with speech data.

iVector

     Posted on: 2018-02-01

iVector – Unique value representing specific group of voice tracks. Consist of 600 values, usually lowered to low dimensional vector (200 values)

FtRT

     Posted on: 2018-02-01

Faster than the Real Time – processing speed on 1 instance of the speech technology. Can be described also as "X min of audio archive to be processed in Y min of the real time".

DIAR

     Posted on: 2018-02-01

Phonexia Speaker Diarization

BSAPI

     Posted on: 2018-02-01

Brno Speech Application Programming Interface

Broadcasting

     Posted on: 2018-02-01

Distribution of audio and video content to a dispersed audience via any audio or visual mass communications medium, but usually one using electromagnetic radiation (radio waves)

ASR

     Posted on: 2018-02-01

Automatic Speech Recognition (several technologies possible see LVCSR, STT or KWS)

Difference between on-the-fly and off-line type of transcription (STT)

     Posted on: 2017-12-11

Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition. The dictate engine, also known as on-the-fly transciption, does not look to the future and has information about just a few seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can't predict what will come in next seconds of the speech. When access to the whole recording is granted during off-line transcription…

Q: What languages do you offer?

     Posted on: 2017-09-07

It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 30+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 10+ including English, French, German, Russian or American Spanish.

Q: What types of integration do you offer?

     Posted on: 2017-08-07

In general: - SDK: API for C++, C# - command line interface - REST interface - Graphical user interface (GUI) for evaluation https://download.phonexia.com/docs/spe/ https://download.phonexia.com/docs/bsapi/

Q: I can’t manage to run Phonexia Browser software. I always get an error.

     Posted on: 2017-06-27

I always get the same error messages: unable to connect to the SPE unable to start the localhost: giving up and kill the localhost. A: It might be because the initialization of SPE engine is too long. Phonexia Browser software treats it as initialization failure and kills the server. You can proceed as follows: Increase timeout in Settings > Speech Engine tab > First connection timeout Use fewer instances of technologies Use smaller models of technologies

Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…, unable to connect to the SPE server, unable to start the localhost…)

     Posted on: 2017-06-27

A: Windows: Open terminal in folder where sir.exe is located (hold Shift and click right mouse button in free space in windows explorer and select “open command window here”) Run PhxBrowser software with command:         PhxBrowser.exe /spe-debug /spe-output SIR software will start with “SPE output” tab which shows the debug output of SPE Linux: Run PhxBrowser software in terminal with command:         ./PhxBrowser --spe-debug --spe-output PhxBrowser software will start with ” SPE output” tab which shows debug output of SPE

Q: Which authentication options are allowed by the server and how does it work?

     Posted on: 2017-06-27

A: The following options are supported: HTTP basic authorization - Client asks for session by resource “post /login” with HTTP basic authorization in query header. If server responds with error 405, server doesn't support authorization by sessions and it is necessary to use basic authorization. Authorization by session - Authorization by session is done by adding parameter “X-SessionID“ into HTTP header to each query. Basic Authorization is done by HTTP standard in header of each query for the server. You can set this in ./settings/phxspe.properties

Q: Can I add words into dictionary?

     Posted on: 2017-06-27

A: It is possible to add words to the dictionary, it is service provided within support by Phonexia once per quartal for a customer. For best results customer should provide our transcription + his correction and multiple types of pronunciation for the specific word.

Q: Please give me a recommendation for LID adaptation set.

     Posted on: 2017-06-27

A: The following is recommended: For adding new language to language pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)

Q: I found the following error: ApplicationStartup: Unhandled exception: BsapiException. What does it mean?

     Posted on: 2017-06-27

[Error] ApplicationStartup: Unhandled exception: BsapiException: SWaveformSegmenterI(/mnt/phxspe/home/phx/storage/dfs/a1cabcf7-c761-49f1 -a9bc-0a8209a09fd9.opus Requested segment (78056, 102056) is out of waveform range (0,91840). Any ideas what this means? A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE.

Q: While trying to install SPE3, I get the error for loading libasound.so.2 libraries.

     Posted on: 2017-06-27

Currently I’m trying to install the provided binaries for Linux, but I do get the following when running phxadmin: ./phxadmin: error while loading shared libraries: libasound.so.2: cannot open shared object file: No such file or directory I’m trying to run this under CentOS 7. A: Please install sound libraries required for manipulation with audio files from official repository into your OS. For CentOS you may use: sudo yum install alsa-utils alsa-lib Hint: Great utility for finding subsequent Redhat/Fedora/CentOS libraries is https://www.rpmfind.net/linux/RPM/index.html

Speaker Diarization (DIAR)

     Posted on: 2017-06-26

About DIAR Phonexia Speaker Diarization (DIAR) enables segmentation of voices in one monochannel audio record. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (segmentation of speech, silence, and technical signals – ie. elimination of phone lines beeps, DTMF tones, music, pauses, etc.) Audio file extracted for each…

Site Map

     Posted on: 2017-06-23

Phonexia Speech Platform Phonexia Speech Platform for Enterprise Phonexia Speech Analytics (SAL) Phonexia Voice Biometrics (VBS) Phonexia Speech Platform for Government Phonexia Speech Analytics GOV (SAL.gov) Phonexia Voice Biometrics GOV (VBS.gov) Components and Tools Phonexia Speech Engine v3 Speech technologies available Phonexia Browser v3 Phonexia Voice Inspector v3 Speech Intelligence Resolver v1 End of Life Components & Tools Phonexia Voice Inspector v1 Knowledge Base Blog Case Studies Demos Frequently Asked Questions (FAQ) How To… Lifetime Support Policies Manuals Presale Whitepapers and Presentations Product Briefs Developer Corner Code Examples Hints for App Design Hints for App Development List of Resources Phonexia…

Save Your Time

     Posted on: 2017-06-22

If you start, the following posts might be interesting for you:   Phonexia Speech Platform is defined as an umbrella concept for all our products and services related to speech technologies. Main packages are Voice Biometrics and Speech Analytics.   Phonexia Browser PhxBrowser - application for quick tests and visualization of speech technologies results.   Speech Engine SPE3 - RESTfull API - it is adjustable server component which houses all speech technologies.   Other "good to start" pages: Academy is to help partners to understand the market, Phonexia’s products and technologies. Manuals Glossary

Software Vetting (Best Practice)

     Posted on: 2017-06-15

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Glossary

     Posted on: 2017-06-15

Glossary terms are automatically propagated through Partner portal content and shown as tool-tip over specific term. Examples: NIST, SID3, REST ... Available Glossary Categories:

Terminology

     Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Get better support

     Posted on: 2017-05-19

This page highlights advices based on the previous experience. If you have any suggestions or correction regarding the innovation of the tech support, please let us know. The Frequently asked questions and Lifetime Support Policies sections prepared for you on Partner Portal could also be of interest to you. Any errors should be tested on the latest version of the product. Please ask your Phonexia contact for a link to download the latest version.   Before submitting issue/ticket... Any errors should be tested on the latest version of the product. Please ask your Phonexia contact for a link to download…

Speech Analytics Course (technical training)

     Posted on: 2017-05-18

The Speech Analytics course consists of the following modules. Please ask your Phonexia contact for detailed description. (YES = this part of the course is obligatory)   SAL course Required time [h] Block name Block description YES 0,5 Intro & Phonexia Portfolio Intro & Phonexia Portfolio YES 0,5 Project focus – Explain basic needs Discussion of partner project focused mainly on finalizing the training topics and agenda. YES 0,75 Application Design & Development – Licensing Presentation of types of licensing, and how to use the license file. YES 0,75 Technologies – Data gathering and Quality measurement – basic Description of…

Voice Biometrics Course (technical training)

     Posted on: 2017-05-18

The Voice Biometrics course consist of the following modules. Please ask your Phonexia contact for detailed description. (YES = this part is mandatory for course)   VBS course Required time [h] Block name Block description YES 0,5 Intro & Phonexia Portfolio Intro & Phonexia Portfolio YES 0,5 Project focus - Explain basic needs Partner project related discussion focused mainly to finalizing training topics and agenda YES 0,75 Apps Designing and Developing - Licensing Gives trainee knowledge about type of licensing, and how to use the license file YES 0,75 Technologies - Data gathering and Quality measurement - basic Data gathering…

How to prepare for course?

     Posted on: 2017-05-18

Partner is encourage to ask his Phonexia contact person to send Training Preparation Questioner. It will help Partner (and to Phonexia) to adjust the content of the technical training (see available courses here). to provide download link for the Phonexia products together with the evaluation license.   The Partner (and Phonexia) can manage expectation together based on the following questions:   1. Training expectations What are your expectations regarding this training? What content do you expect? Looking at the schedule - what are top priority topics? What format do you expect? (ppt, hands-on, discussion) Do you prefer paper copies for…

Manuals

     Posted on: 2017-05-18

This section collects links or locations of manuals for specific Phonexia Speech Platform components. API Phonexia Speech Engine REST API - SPE - latest version manual online (api_reference.html for your version is located in doc subdirectory in SPE folder or distribution ZIP) Brno Speech Application Interface v3 - BSAPI3 – latest version manual online Applications and Tools Phonexia Browser - PhxBrowser_manual.pdf is located in the root folder of the Browser application or distribution ZIP Voice Inspector v3 - VIN-manual.pdf is located in the root folder of the Voice Inspector application or distribution ZIP End of Life Products & Tools Speech…

Frequently Asked Questions (FAQ)

     Posted on: 2017-05-18

You might browse the FAQ by topic or tags: FAQ - complete list of posts tagged: Speech Engine v3 (SPE3) related tagged: Voice Biometrics (VBS) related tagged: Speech Analytics (SAL) related   Please leave us a comment, if you find any incompleteness and need more details.