Search Results for: ROM

Results 1 - 50 of 68 Page 1 of 2
Results per-page: 10 | 20 | 50 | 100

Performance of the Speaker Identification 4th generation (SID4): Intel® Xeon® Platinum 8124M

     Posted on: 2019-10-30

Benchmark goals Find realistic performance using total recording length Find FTRT based exactly on net_speech (engineering sizing data) Find system performance using all physical cores Find system performance using all logical cores Infrastructure setup Intel® Xeon® Platinum 8124M is used in virtual machine with 8 physical cores reserved exclusively for this VM, Hyper Threading is enabled [16 logical cores available], 32GB RAM, 30GB SSD based storage, 1000 I/O.s-1  reserved per core Benchmark data setup Data set statistic: Number of files: 32 [300 seconds each] RAW recordings length ∑: 9600 [sec] Net speech length ∑: 4224.77 [sec] In the data set…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

     Posted on: 2019-10-30

Faster Than Real Time (FTRT) is metrics developed for defining software performance reference point. Using this metric you can collect "benchmark" data of real processing speed for reviewed software, which should be find - and reproduce - on exactly defined HW. Then, comparing various benchmarks result, you can compare performance of the specified software and its parts on different HW configurations. And vice versa using the same metric you can compare software of different vendors on the same HW configuration and for the same processing task. We are recognizing two measurable metrics: Recording based FTRT is calculated from real recordings…

Browser3 – Releases and Changelogs

     Posted on: 2019-10-09

Phonexia Browser v3 (Browser3) is developed as client on top of Phonexia Speech Engine v3. Phonexia Browser is a successor of Phonexia Speech Intelligence Resolver v1 (SIR1). This page lists changes in Browser releases. Releases Changelogs Phonexia Browser v3.18.0, BSAPI 3.22.0 - Oct 03 2019 New: Waveform editor can now process stereo file by Diarization in per-channel mode New: Added Gender balance and Score sharpness in Settings -> Scoring New: Multiple columns in Result pane can be turned on/off at once using context menu New: Minimum speech length changed to 7 seconds Fixed: LID results information chart is not updated…

TUTORIAL: Speaker Identification – How to Do a Basic Test

     Posted on: 2019-10-08

Phonexia Speaker Identification is a voice biometry tool for recognition of speakers by their voice. In this video, we will show you how to start using this technology! You will learn how to create a "Speaker Model" to identify a speaker in a set of data. Ready to test it? Start with our video: What else is needed? 1. Phonexia Evaluation Package Evaluation package (download page) is consisting of Phonexia Browser and Phonexia Speech Engine including all necessary technologies. 2. Data We prepared the dataset for your testing. Package contains data for speaker model creation and speaker spotting too. The…

Workflow – Releases and Changelogs

     Posted on: 2019-10-07

Phonexia Workflow is a set of tools complementing Phonexia Speech Engine (SPE), which allow users to chain speech technologies into scenarios and process audio recordings automatically using these scenarios. This page lists changes in Workflow releases. Changelogs == Phonexia Workflow v1 == Phonexia Workflow 1.4.1 (10/07/2019) - SPE 3.16 - 3.17 Support for IPv4 only (since SPE does not support IPv6) Configurable application webhook address in both Workflow Runner and Data Discovery Tool This address is auto-detected when no value is supplied - default In some cases like network specific configuration it might be necessary to configure it manually Rapid…

SPE3 – Releases and Changelogs

     Posted on: 2019-10-02

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs == SPE v3.18.x == Speech Engine 3.18.2 (10/14/2019) - DB v1300, BSAPI 3.22.1 Fixed: Customized STT model fails on Windows with Request for next state but ending state reached. error message Speech Engine 3.18.1 (10/01/2019) - DB v1300, BSAPI 3.22.0 New: DICTATE technology has been renamed to STT_STREAM (/technologies/dictate -> /technologies/stt/stream) (for backward compatibility, the /technologies/dictate endpoint is internally redirected) New: SID/SID4…

Phonexia Workflow

     Posted on: 2019-08-06

About Phonexia Workflow is a set of tools complementing Phonexia Speech Engine (SPE), which allow users to chain speech technologies into scenarios and process audio recordings automatically using these scenarios. Scenarios are programmed using uniform API which provides an abstraction over Phonexia Speech Engine application. Provided Phonexia Workflow scenarios: SalEssentials - Speech Analytics Essentials filter out low quality audio files, provides demographic information, age estimation and speech to text processing VbsEssentials - Voice Biometrics Essentials filter out low quality audio files, provides gender identification, age estimation and speaker identification   The scenario is a tiny Java application which interacts with…

Voice Inspector – supporting technologies

     Posted on: 2019-06-28

Automatic Speaker Identification (SID) is the most important but not the only Phonexia technology that is implemented in Voice Inspector (VIN). Apart from SID, forensic experts, users of VIN, can benefit from automatic Signal-to-Noise Ratio calculation, Voice Activity detection, Phoneme search, and a Wave editor which incorporates the waveform, spectrum and power panel. Let's have a look on how to utilize individual technologies. Signal-to-Noise Ratio Recording quality can strongly influence the reliability of SID results and so the outcome of a forensic case. Therefore, VIN uses a module of Phonexia Speech Quality Estimation (SQE) to calculate the Signal-to-Noise Ratio (SNR)…

Voice Inspector – Interpretation of results

     Posted on: 2019-06-24

Introduction Phonexia Voice Inspector (VIN) is a tool for forensic automatic speaker identification, compliant with the Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, published by the European Network of Forensic Science Institutes.  This post explains individual SID score types and ways to visualize the results in a speaker identification case implemented in Voice Inspector. Evidence In VIN, the term evidence has two meanings. In general, it refers to any SID score that the system calculates for any pair of recordings in the case. These scores are the output of the Phonexia SID technology which runs…

Speaker Identification (SID)

     Posted on: 2019-06-13

Phonexia Speaker Identification uses the power of voice biometry to recognize speakers by their voice... i.e. to decide whether the voice in two recordings belongs to the same person or two different people. High accuracy of Speaker Identification, the Phonexia's flagship technology, has been validated in a NIST Speaker Recognition Evaluations. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker Identification is the case when we are asking "Whose voice is this?", such as in fake emergency calls.…

Keyword Spotting results explained

     Posted on: 2019-06-12

This article aims on giving more details about Keyword Spotting outputs and hints on how to tailor Keyword Spotting to suit best your needs. Scoring and results explanation Keyword Spotting works by calculating likelihoods that at a given spot occurs a keyword or just any other speech, and comparing those two likelihoods. The following scheme shows Background model for anything before the keyword (1), the Keyword model (2) and a Background model of any speech parallel with the keyword model (3). Models 2 and 3 produce two likelihoods – Lkw and Lbg (any speech = background). Raw score is calculated…

Keyword Spotting

     Posted on: 2019-06-03

Phonexia Keyword Spotting (KWS) identifies occurrences of keywords and/or keyphrases in audio recordings. It can help you to get valuable information from huge quantities of speech recordings. You only need to specify the keywords or phrases you wish to find. This technology identifies all recordings with keyword occurrences and allows you to automatically route important recordings or calls to your experts. Typical use cases Call centers increase operator and supervisor efficiency by searching calls identify inappropriate expressions from operators check marketing campaigns with automatic script-compliance control Mass media and web search servers index and search multimedia by keyword route multimedia…

Speaker Identification: Results Enhancement

     Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…

Speech To Text results explained

     Posted on: 2019-05-27

This article aims on giving more details about Speech To Text outputs and hints on how to tailor Speech To Text to suit best your needs. In the process of transcribing speech, the Speech To Text technology usually identifies multiple alternatives for individual speech segments, as multiple phrases can have similar pronunciations, possibly with different word boundaries, e.g. “eight tea machines” vs. “eighty machines”. The technology provides several types of output to show only one or more transcription alternatives. One-best output 1-best output provides transcription containing only the highest-scoring words. Each segment provides information about the transcribed word itself, the…

Speech To Text

     Posted on: 2019-05-27

Phonexia Speech To Text – also known as a voice-to-text or speech recognition – converts speech signals into plain text. After the conversion, text can be easily read, edited, searched, processed by text-based data mining tools or archived. Phonexia Speech To Text is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Typical use cases look for specific information in large call archives (e.g., claims inspection) get additional value by advanced analysis of call traffic (e.g., topic detection) maintain short reaction times by routing calls…

Language Identification (LID)

     Posted on: 2019-05-20

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Phonexia uses state-of-the-art language identification (LID) technology based on iVectors that were introduced by NIST (National Institute of Standards and Technology, USA) during the 2010 evaluations. The technology is independent on any text, language, dialect, or channel. This highly accurate technology uses the power of voice biometrics to automatically recognize spoken language. Application areas Preselecting multilingual sources and routing audio streams/files…

Language Identification results explained

     Posted on: 2019-05-20

This article aims on giving more details about Language Identification scoring and hints on how to tailor Language Identification to suit best your needs. Scoring and results explanation When Phonexia Language Identification identifies a language in audio recording (or languageprint) using a language pack, it creates languageprint of the recording (if input is audio recording) compares that languageprint with each language in a language pack and calculates probability that these two languages are the same The final scores are returned as logarithms of these individual probabilities – i.e. as values from {-inf,0} interval – for each language in the language pack.…

STT Language Model Customization tutorial

     Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform the…

Phonexia End User License Agreement

     Posted on: 2019-02-27

Please read the terms and conditions of this End User License Agreement (the “Agreement”) carefully before you use the Phonexia proprietary software providing speech solutions, technologies and accompanying services (the “Software”) delivered and marketed by Phonexia s.r.o.

Phonexia technologies introduction

     Posted on: 2019-01-25

Core objective: Basic understanding of Phonexia speech technologies and products; typical use cases, implementations and deployment topologies Duration: 35 minutes intended for idea makers and product designers assumes generic knowledge of Phonexia and speech technologies in general Content 00:00 Introduction What information can we get from speech? Overview of basic use cases Phonexia Speech Platform brief 4:21 Phonexia technologies overview and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender…

Error 1013: Unsupported: Server does not support authentication with token

     Posted on: 2018-12-10

Please check SPE subdirectory ./settings for configuration files. If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive: server.enable_authentication_token = false In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section "Basic authentication" If you would like to play with "pure" daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by phxadmin utility or can be created from ./data/phxspe.properties.default template file. Copy template file to ./settings directory Rename it to phxspe.properties Check for server.enable_authentication_token directive and setup it as…

Phonexia technology models EoL

     Posted on: 2018-07-11

Information about release dates, support and maintenance periods of Phonexia technology models.

SPE3 – Quick Start Guide

     Posted on: 2018-04-16

Do you want to run the SPE3 for the first time? This post can help you. Distribution, installation and configuration SPE is distributed by Phonexia in .zip archives. These are downloaded from Phonexia package manager using link provided by Phonexia employee. Installation is done by simple unzipping the content of the downloaded .zip archive to SPE installation folder. Configuration of SPE is done at two places. First is executable file ./phxadmin or .\phxadmin.exe serving to set file to configuration and license files configure speech technologies configure user accounts set up of few various setting Running the ./phxadmin or .\phxadmin.exe command…

Designing and Developing Application

     Posted on: 2018-04-15

Before designing and developing the application, we encourage Partner to find clear answer for the following questions: Customer requirements: Do my customers need file processing (audio) or stream processing in real time? What is the human power of the customer that can analyze the results? How many minutes per day or streams in parallel do my customer need to process? What are real benefits for customer (finding the needle in haystack, approaching new information, processing only few data with highest possible accuracy)? How the solution match the current processes and infrastructure of the customer? How many false alarms are acceptable…

Time Analysis

     Posted on: 2018-04-15

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow. That makes it easy to identify long reaction time, crosstalk, or responses of speakers in both channels. This technology is only meaningful when used on recordings with 2 channels. As an answer to the TAE technology, SPE returns a json/xml file. This file includes general information about the technology and details of the time analysis. The technology can work either with a closed recording or with a stream. Monologue Describes the statistics of a recording related to one channel. channel…

Age Estimation

     Posted on: 2018-04-12

Phonexia Age Estimation (AGE) estimates the age of a speaker from audio recording. The process of voiceprint extraction is similar to the extraction of SID, but as a result different features get extracted; therefore, the voiceprints extracted from AGE and SID are not mutually compatible. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling…

VIN3 – Releases and Changelogs

     Posted on: 2018-04-08

Phonexia Voice Inspector v3 (VIN) is developed as a desktop application on top of Phonexia BSAPI. This page lists changes in VIN releases. Releases Changelogs Voice Inspector v3.2.2, BSAPI 3.15.0 - Jun 5 2018 - Fixed possible application crash on Windows - Added phoneme type 'affricate' and fixed phoneme types: * phoneme 'C' changed from 'fricative' to 'affricate' * phoneme 'D' changed from 'fricative to 'plosive' * phoneme 'T' changed from 'fricative to 'plosive' * phoneme 'c' changed from 'plosive' to 'affricate' Voice Inspector v3.2.1, BSAPI 3.15.0 - Mar 16 2018 - Export of Speakers/Populations allows export only voiceprints -…

Speech Analytics

     Posted on: 2018-04-06

Overview Phonexia Speech Analytics allows you to understand the  content of audio without having to listen to it. The results help both commercial entities and security/defense forces for immediate precise decision and response. The technologies reveal automatically WHAT content, TOPIC and KEY PHRASES are spoken, and many other metadata.   Speech Analytics - Typical Use-Cases Speech transcription is used in various application. Knowledge of content of whole call is bringing business value to the customer, comparing to listening the audio files by analytic or supervisor. Reading the text is also faster than listening the audio. Speech Analytics output is often…

Software Vetting

     Posted on: 2018-04-06

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Voice Activity Detection – Essential

     Posted on: 2018-04-04

Phonexia Voice Activity Detection (VAD) identifies parts of audio recordings with speech content vs. nonspeech content. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output Log file with processed information (speech vs. nonspeech segments) Segmentation The section Segmentation describes the results of VAD, which are segments of detected voice and silence. Segments are…

Speech Quality Estimator – Essential

     Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…

Keyword pronunciation

     Posted on: 2018-04-04

Pronunciation of the keyword(s) is generated automatically (G2P, grapheme to phoneme)  or produced from the lexicon of known words (“lexicon”) or converted from audio (phoneme transcription). It can be edited manually for each word (Phonexia do not limit the number of pronunciations per keywords/phrases).

Product Portfolio

     Posted on: 2018-04-02

Phonexia Speech Platform is an umbrella concept for all Phonexia’s products and services related to speech technologies. It gives us the ability to customize various products to a wide range of customer needs. Platform Edition is an encapsulation of specific setup of speech technologies, modules, applications, utilities and services designed for a specific market segment. We distinguish Speech Analytics (SAL) and Voice Biometrics (VBS) as most common domain of usage. It is also a tool for marketing and sales. Voice Biometrics is focused more on identifying speaker, gender, language spoken and more. Speech Analytics focuses on gathering information about content…

Phonexia Voice Inspector v3

     Posted on: 2018-04-02

About Phonexia Voice Inspector v3 (VIN3) provides police forces and forensic experts with a highly accurate speaker identification tool during investigation of criminal matters. It uses the power of voice biometry to automatically recognize speakers by their voice. Main features of the VIN3 application: Automatic speaker identification tool to strengthen results of the standard linguistics- and phonetics-based approach Scoring in Likelihood Ratio (LR) – result from a statistical test for a comparison of two hypotheses. The system returns a number from the interval <0, +∞>, which expresses how many times more likely the data are under one hypothesis than the…

Phonexia Ethical Code

     Posted on: 2018-03-24

Application of the Code It is the policy of Phonexia, s.r.o. (“Phonexia”, “we”) to maintain the highest level of ethical standards in the conduct of our business affairs. Our values guide our actions in all cases. The actions and conduct of our officers, directors and employees (collectively, “Phonexia personnel”), as well as others acting on our behalf, are essential to maintain these standards and promote highly ethical reputation of Phonexia. To that end, all our personnel including agents, consultants and contractors as well as distribution partners involved in Phonexia´s international business activities must read, become familiar and comply with this…

Terms of Service

     Posted on: 2018-03-24

Description of the Services provided by Phonexia s.r.o. 1. Acceptance of Terms of Service (Terms as a Contract) 1.1. PHONEXIA-User Relationship. These Terms of Service (hereinafter referred to as "Agreement" or „Terms of Service“) and the PHONEXIA Privacy Policy govern the relationship between Phonexia s.r.o. (ID No.: 27680258, VAT No.: CZ27680258, registred seat at: Chaloupkova 3002/1a, 61200 Brno, registred by the County Court in Brno under file C, insert 5124), provider of the PHONEXIA technology (hereinafter referred to as "PHONEXIA") and you ("you", "your", „user“ or "Member"), and your use of and access to the website, PHONEXIA services or any…

Privacy Policy

     Posted on: 2018-03-24

Phonexia s.r.o. with registered seat at Chaloupkova 3002/1a, 612 00 Brno, Czech Republic, is a developer and provider of speech technologies software products and related services. We appreciate your visit on our websites and we are pleased that you are interested in our software products and related services. We conform our data use to the European Union’s (“EU”) General Data Protection Regulation (“GDPR”). This Privacy Policy should help you to understand how we as a data controller gather, use and protect your personal information. 1. COLLECTING PERSONAL INFORMATION When you sign up for a Phonexia Account to allow you using…

Account

     Posted on: 2018-03-21

Registered info: GDPR tools: Full name: Login name: E-mail: Change profile Change password Phonexia Partner Portal documents access level: Hints: General rules Registration for Phonexia Partner Portal is for free. But various user access levels are applied to the articles, some of them are available only for Phonexia Partners and Certified members. You may ask for promoting your access level by asking for business support on info@phonexia.com Legal documents By registration, login to and using this website you agree with the Privacy Policy and Terms of Service. .

Phonexia – introduction

     Posted on: 2018-03-14

What we believe in At Phonexia, we find joy in pushing the boundaries of innovation in the field of speech technology by automating and simplifying solutions for many of today’s complex communication and security-strategic challenges. By providing our partners and customers with state-of-the art speech-technology software, we leverage the power, and data, in their voices. Who we are Phonexia is the only speech technology software manufacturer that reveals and leverages the most data in speech for enterprising trailblazers across the globe who want to discover and develop powerful new skills in a knowledge-based economy. We have more than 19 years…

Licensing (technical details)

     Posted on: 2018-03-02

This document describes all licensing types for Phonexia product licensing available to our partners and customers. Each partner/customer can choose the licensing variant which best fits the current project or infrastructure. The document does not describe business conditions of Phonexia licensing. What is the License? The License is a formal agreement regarding “The Product Usage Rights” between Phonexia s.r.o. and a user of any Phonexia technology or Phonexia product. Licenses are issued by the Business Department for all speech technologies and products, and may be required in order to use utilities and tools developed by Phonexia or partners. For technical…

SPE configuration

     Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Sizing of the computing units for speech technologies

     Posted on: 2018-02-02

Best practices for good sizing of Phonexia technologies depend on a few facts: Intense work with large data sets requires good performance and bandwidth between RAM and CPU. It all depends on the size of the files with technological models data, usually loaded into RAM and used intensively for computing operations Always think only about physical cores of CPU (HT, VT features can't help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation…

VP

     Posted on: 2018-02-01

Voice Print – output from spoken speech extraction process of SID. Unique mathematical representation of the specific speaker or recording is created in form of the iVector (for SID generation 3) or xVector (Deep Embeddings for SID generation 4).

Median

     Posted on: 2018-02-01

Median – Value separating higher half of data sample from lower half.

LR

     Posted on: 2018-02-01

Likelihood Ratio – Result from statistical test for two models comparation. It gives back number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <-∞;+∞>

LPA

     Posted on: 2018-02-01

Language Print Archive - pack of language prints from the recordings spoken in the same language/dialect. Used for the language identification in LID comparison.

LP

     Posted on: 2018-02-01

Language Print - output data from LID technology

FEA

     Posted on: 2018-02-01

Features – FEA is optional output from KWS technology. Looking for keywords in FEA is faster than in original recording.

Broadcasting

     Posted on: 2018-02-01

Distribution of audio and video content to a dispersed audience via any audio or visual mass communications medium, but usually one using electromagnetic radiation (radio waves)