Search Results for: SID

Results 1 - 43 of 43 Page 1 of 1
Results per-page: 10 | 20 | 50 | 100

SPE3 – Releases and Changelogs

     Posted on: 2020-10-14

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs Speech Engine 3.35.1 (10/13/2020) - DB v1600, BSAPI 3.35.1 Public release Fixed: Missing input stream task name in log messages Fixed: Missing arguments in "word not found" error messages (when using preferred phrases) Speech Engine 3.35.0 (10/01/2020) - DB v1600, BSAPI 3.35.0 Public release New: LID model L4 was promoted to production (LID BETA_L4 renamed to LID L4) New: Added new language tag…

What is a user configuration file and how to use it

     Posted on: 2020-03-28

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it's difficult to restore working state. User configuration files provide a way to override processing parameters without modifying original BSAPI configuration files. WARNING: Inappropriate configuration changes may cause serious issues! Make sure you really know what you are doing. User configuration file is a…

How to configure STT realtime stream word detection parameters

     Posted on: 2020-03-28

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part of the signal going to the decoder. Decoder is a component, which determines what a particular part of the signal contains (speech, silence, etc.). Based on that, decoder also decides whether segment has finished or not. Unlike in file processing…

Performance of the Speaker Identification 4th generation (SID4): Intel® Xeon® Platinum 8124M

     Posted on: 2019-10-30

Benchmark goals Find realistic performance using total recording length Find FTRT based exactly on net_speech (engineering sizing data) Find system performance using all physical cores Find system performance using all logical cores Infrastructure setup Intel® Xeon® Platinum 8124M is used in virtual machine with 8 physical cores reserved exclusively for this VM, Hyper Threading is enabled [16 logical cores available], 32GB RAM, 30GB SSD based storage, 1000 I/O.s-1  reserved per core Benchmark data setup Data set statistic: Number of files: 32 [300 seconds each] RAW recordings length ∑: 9600 [sec] Net speech length ∑: 4224.77 [sec] In the data set…

Workflow – Releases and Changelogs

     Posted on: 2019-10-07

Phonexia Workflow is a set of tools complementing Phonexia Speech Engine (SPE), which allow users to chain speech technologies into scenarios and process audio recordings automatically using these scenarios. This page lists changes in Workflow releases. Changelogs == Phonexia Workflow v1 == Phonexia Workflow 1.4.1 (10/07/2019) - SPE 3.16 - 3.17 Support for IPv4 only (since SPE does not support IPv6) Configurable application webhook address in both Workflow Runner and Data Discovery Tool This address is auto-detected when no value is supplied - default In some cases like network specific configuration it might be necessary to configure it manually Rapid…

Technical Training Essentials

     Posted on: 2019-09-27

Core objective: Understanding technical essentials of using Phonexia technologies and products Duration: ~94 minutes (7 + 19 + 22 + 23 + 23 min chapters) intended for product architects or developers assumes you have already watched Phonexia technologies introduction video assumes understanding of working in command line REST API principles processing JSON or XML Introduction (7 min) technologies recap CLI, REST and GUI interfaces overview https://youtu.be/xzrHyyIl01s MODULE 1: Getting started with Speech Engine (19 min) Installation Technologies configuration Server and database configuration Users configuration Files processing Synchronous and asynchronous requests, results polling Stream processing https://youtu.be/4qrB-GfFdWY MODULE 2: Filtering and supporting…

Voice Inspector – supporting technologies

     Posted on: 2019-06-28

Automatic Speaker Identification (SID) is the most important but not the only Phonexia technology that is implemented in Voice Inspector (VIN). Apart from SID, forensic experts, users of VIN, can benefit from automatic Signal-to-Noise Ratio calculation, Voice Activity detection, Phoneme search, and a Wave editor which incorporates the waveform, spectrum and power panel. Let's have a look on how to utilize individual technologies. Signal-to-Noise Ratio Recording quality can strongly influence the reliability of SID results and so the outcome of a forensic case. Therefore, VIN uses a module of Phonexia Speech Quality Estimation (SQE) to calculate the Signal-to-Noise Ratio (SNR)…

Voice Inspector – Interpretation of results

     Posted on: 2019-06-24

Introduction Phonexia Voice Inspector (VIN) is a tool for forensic automatic speaker identification, compliant with the Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, published by the European Network of Forensic Science Institutes.  This post explains individual SID score types and ways to visualize the results in a speaker identification case implemented in Voice Inspector. Evidence In VIN, the term evidence has two meanings. In general, it refers to any SID score that the system calculates for any pair of recordings in the case. These scores are the output of the Phonexia SID technology which runs…

Speaker Identification (SID)

     Posted on: 2019-06-13

Phonexia Speaker Identification uses the power of voice biometry to recognize speakers by their voice... i.e. to decide whether the voice in two recordings belongs to the same person or two different people. High accuracy of Speaker Identification, the Phonexia's flagship technology, has been validated in a NIST Speaker Recognition Evaluations. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker Identification is the case when we are asking "Whose voice is this?", such as in fake emergency calls.…

Speaker Identification: Results Enhancement

     Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…

Phonexia End User License Agreement

     Posted on: 2019-02-27

Please read the terms and conditions of this End User License Agreement (the “Agreement”) carefully before you use the Phonexia proprietary software providing speech solutions, technologies and accompanying services (the “Software”) delivered and marketed by Phonexia s.r.o.

Phonexia technologies introduction

     Posted on: 2019-01-25

Core objective: Basic understanding of Phonexia speech technologies and products; typical use cases, implementations and deployment topologies Duration: 35 minutes intended for idea makers and product designers assumes generic knowledge of Phonexia and speech technologies in general Content 00:00 Introduction What information can we get from speech? Overview of basic use cases Phonexia Speech Platform brief 4:21 Phonexia technologies overview and their usages Filtering and supporting technologies 04:32 Speech Quality Estimation (SQE) 05:27 Voice Activity Detection (VAD) 06:37 Diarization (DIAR) 07:41 Age Estimation (AGE) 08:14 Waveform Denoiser Voice Biometrics technologies 08:56 Speaker Identification (SID) 10:18 Language Identification (LID) 11:10 Gender…

Error 1007: Unsupported audio format

     Posted on: 2018-12-10

Phonexia Browser application may return error "1007: Unsupported audio format" during uploading audio file. Please consider if your audio files are in . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org Then continue as described below: Using Phonexia Browser with embed SPE Open the Browser configuration dialog by click on button "Settings" located in tool ribbon. Select tab "Speech Engine" and configure SPE as described…

Supported audio formats

     Posted on: 2018-12-10

Supported audio format are: WAVE (*.wav) container including any of: unsigned 8-bit PCM (u8) unsigned 16-bit PCM (u16le) IEEE float 32-bit (f32le) A-law (alaw) µ-law (mulaw) ADPCM FLAC codec inside FLAC (*.flac) container OPUS codec inside OGG (*.opus) container   Other audio formats must be converted using external tools. SPE server can be configured to support automated conversion on background, see SPE configuration hints. Great tools for converting other than supported formats to supported are ffmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for MS Windows, Linux and Apple OS X. Example of usage: ffmpeg ffmpeg -i <source_audio_file_name>…

Error 1013: Unsupported: Server does not support authentication with token

     Posted on: 2018-12-10

Please check SPE subdirectory ./settings for configuration files. If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive: server.enable_authentication_token = false In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section "Basic authentication" If you would like to play with "pure" daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by phxadmin utility or can be created from ./data/phxspe.properties.default template file. Copy template file to ./settings directory Rename it to phxspe.properties Check for server.enable_authentication_token directive and setup it as…

Phonexia technology models EoL

     Posted on: 2018-07-11

Information about release dates, support and maintenance periods of Phonexia technology models.

Age Estimation

     Posted on: 2018-04-12

Phonexia Age Estimation (AGE) estimates the age of a speaker from audio recording. The process of voiceprint extraction is similar to the extraction of SID, but as a result different features get extracted; therefore, the voiceprints extracted from AGE and SID are not mutually compatible. Technology Trained with emphasis on spontaneous telephony conversation The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies channel compensation techniques): GSM/CDMA, 3G, VoIP, landlines, etc. Input Input format for processing: WAV or RAW (8 or 16 bits linear coding), A-law or Mu-law, PCM, 8kHz+ sampling…

VIN – Releases and Changelogs

     Posted on: 2018-04-08

Phonexia Voice Inspector (VIN) is developed as a desktop application for forensic speaker comparison. This page lists changes in VIN releases. Releases Changelogs Voice Inspector v4.0.0, BSAPI 3.23.0 - Dec 11 2019 - VIN is available with L4 technology model - Other technology models (S2, L2, L3, XL3) are no longer supported - Added Diarization Technology (available in waveform editor) - Population Sets structure changed - Reworked dialog for population set management - Added possibility to set type of estimation of the Target distribution - Using population set to estimate Target distribution allows 1:1 comparison - Bug fixes Voice Inspector…

Voice Biometrics

     Posted on: 2018-04-07

Overview Phonexia Voice Biometrics is a special edition of Phonexia Speech Platform which allows you to understand the nature of audio without having to listen to it. The product helps people to utilize the power of voice biometrics to verify speaker or identify crimes. The technologies reveals automatically WHO, what GENDER, what LANGUAGE is speaking, and many other metadata. Voice Biometrics - Typical Use-Cases Use case Speaker Verification is tailored to banks/insurance companies/money lending companies and others, where is needed to confirm if caller/voice in audio file is the same person who is known to the customer. For this use…

Speech Analytics

     Posted on: 2018-04-06

Overview Phonexia Speech Analytics allows you to understand the  content of audio without having to listen to it. The results help both commercial entities and security/defense forces for immediate precise decision and response. The technologies reveal automatically WHAT content, TOPIC and KEY PHRASES are spoken, and many other metadata.   Speech Analytics - Typical Use-Cases Speech transcription is used in various application. Knowledge of content of whole call is bringing business value to the customer, comparing to listening the audio files by analytic or supervisor. Reading the text is also faster than listening the audio. Speech Analytics output is often…

Software Vetting

     Posted on: 2018-04-06

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Open Source Acknowledgement

     Posted on: 2018-04-06

This page collect information about Open Source code and licenses. You might be interested to ask your Phonexia contact what part of the page is relevant to your project. Phonexia Voice Verify dependencies Name  Version  License  Django  2.1.11  BSD Jinja2  2.11.2  BSD-3-Clause  MarkupSafe  1.1.1  BSD-3-Clause  Pygments  2.6.1  BSD License beautifulsoup4  4.9.1  MIT  behave  1.2.6  BSD behave-django  1.4.0  MIT  certifi  2020.6.20  MPL-2.0  chardet  3.0.4  LGPL  coreapi  2.3.3  BSD coreschema  0.0.4  BSD  defusedxml  0.6.0  PSFL  django-allauth  0.39.1  MIT  django-constance  2.7.0  BSD  django-cors-headers  3.4.0  MIT License  django-environ  0.4.5  MIT  django-extra-fields  2.0.5  Apache-2.0  django-picklefield  3.0.1  MIT  django-rest-auth  0.9.3  MIT  djangorestframework  3.9.1  BSD  docker  4.2.2 …

Speech Quality Estimator – Essential

     Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…

Phonexia Ethical Code

     Posted on: 2018-03-24

Application of the Code It is the policy of Phonexia, s.r.o. (“Phonexia”, “we”) to maintain the highest level of ethical standards in the conduct of our business affairs. Our values guide our actions in all cases. The actions and conduct of our officers, directors and employees (collectively, “Phonexia personnel”), as well as others acting on our behalf, are essential to maintain these standards and promote highly ethical reputation of Phonexia. To that end, all our personnel including agents, consultants and contractors as well as distribution partners involved in Phonexia´s international business activities must read, become familiar and comply with this…

Terms of Service

     Posted on: 2018-03-24

Description of the Services provided by Phonexia s.r.o. 1. Acceptance of Terms of Service (Terms as a Contract) 1.1. PHONEXIA-User Relationship. These Terms of Service (hereinafter referred to as "Agreement" or „Terms of Service“) and the PHONEXIA Privacy Policy govern the relationship between Phonexia s.r.o. (ID No.: 27680258, VAT No.: CZ27680258, registred seat at: Chaloupkova 3002/1a, 61200 Brno, registred by the County Court in Brno under file C, insert 5124), provider of the PHONEXIA technology (hereinafter referred to as "PHONEXIA") and you ("you", "your", „user“ or "Member"), and your use of and access to the website, PHONEXIA services or any…

Privacy Policy

     Posted on: 2018-03-24

Phonexia s.r.o. with registered seat at Chaloupkova 3002/1a, 612 00 Brno, Czech Republic, is a developer and provider of speech technologies software products and related services. We appreciate your visit on our websites and we are pleased that you are interested in our software products and related services. We conform our data use to the European Union’s (“EU”) General Data Protection Regulation (“GDPR”). This Privacy Policy should help you to understand how we as a data controller gather, use and protect your personal information. 1. COLLECTING PERSONAL INFORMATION When you sign up for a Phonexia Account to allow you using…

Licensing (technical details)

     Posted on: 2018-03-02

This document describes all licensing types for Phonexia product licensing available to our partners and customers. Each partner/customer can choose the licensing variant which best fits the current project or infrastructure. The document does not describe business conditions of Phonexia licensing. What is the License? The License is a formal agreement regarding “The Product Usage Rights” between Phonexia s.r.o. and a user of any Phonexia technology or Phonexia product. Licenses are issued by the Business Department for all speech technologies and products, and may be required in order to use utilities and tools developed by Phonexia or partners. For technical…

SPE configuration

     Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

Sizing of the computing units for speech technologies

     Posted on: 2018-02-02

Best practices for good sizing of Phonexia technologies depend on a few facts: Intense work with large data sets requires good performance and bandwidth between RAM and CPU. It all depends on the size of the files with technological models data, usually loaded into RAM and used intensively for computing operations Always think only about physical cores of CPU (HT, VT features can't help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation…

VP

     Posted on: 2018-02-01

Voice Print – output from spoken speech extraction process of SID. Unique mathematical representation of the specific speaker or recording is created in form of the iVector (for SID generation 3) or xVector (Deep Embeddings for SID generation 4).

SID

     Posted on: 2018-02-01

Phonexia Speaker Identification, multiple generations available marked by version like SIDv2 or SIDv3

ARPA

     Posted on: 2018-02-01

Advanced Research Projects Agency created in 1958 by President Dwight D. Eisenhower for the purpose of forming and executing research and development projects to expand the frontiers of technology

Q: Please describe how to get the results for a pending operation.

     Posted on: 2017-06-27

A: If server responds on pending request by status 200 - OK,  the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on pending request by status 202 - Accepted, server will create task and server will begin to process the file. In response HTTP header (in parameter "Location") there is path for pending resource. In the body there is a ID of pending operation. Polling: Client asks on the pending resource (e.g. “get /pending/{ID}). Server will answer with…

Software Vetting (Best Practice)

     Posted on: 2017-06-15

The purpose of this document is to help client to satisfy their high security standards during integration of Phonexia software to their critical infrastructure. The vetting ensures that Phonexia software is not dangerous to the client’s infrastructure in any way. It means there are no backdoors, viruses, worms, Trojan horses, spyware, adware, critical bugs, unwanted functionality, no information is sent outside the client’s infrastructure. Vetting context Speech technology is a very dynamic area with a very fast development. For example the speaker identification error rate decreases to half between each two evaluations organized by National Institute of Standards and Technology,…

Glossary

     Posted on: 2017-06-15

Glossary terms are automatically propagated through Partner portal content and shown as tool-tip over specific term. Examples: NIST, SID3, REST ... Available Glossary Categories:

Terminology

     Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Speech Analytics Course (technical training)

     Posted on: 2017-05-18

The Speech Analytics course consists of the following modules. Please ask your Phonexia contact for detailed description. (YES = this part of the course is obligatory)   SAL course Required time [h] Block name Block description YES 0,5 Intro & Phonexia Portfolio Intro & Phonexia Portfolio YES 0,5 Project focus – Explain basic needs Discussion of partner project focused mainly on finalizing the training topics and agenda. YES 0,75 Application Design & Development – Licensing Presentation of types of licensing, and how to use the license file. YES 0,75 Technologies – Data gathering and Quality measurement – basic Description of…

Voice Biometrics Course (technical training)

     Posted on: 2017-05-18

The Voice Biometrics course consist of the following modules. Please ask your Phonexia contact for detailed description. (YES = this part is mandatory for course)   VBS course Required time [h] Block name Block description YES 0,5 Intro & Phonexia Portfolio Intro & Phonexia Portfolio YES 0,5 Project focus - Explain basic needs Partner project related discussion focused mainly to finalizing training topics and agenda YES 0,75 Apps Designing and Developing - Licensing Gives trainee knowledge about type of licensing, and how to use the license file YES 0,75 Technologies - Data gathering and Quality measurement - basic Data gathering…

Speech Intelligence Resolver v1

     Posted on: 2017-05-18

About Phonexia Speech Intelligence Resolver v1 (SIR1) combines the power of speech technologies within a single application. The application automatically performs visualization of the record as well as filtering the speech metadata uncovered from your records effectively. Speech technologies implemented: Phonexia Speaker Identification (SID2) Phonexia Language Identification (LID2) Phonexia Gender identification (GID) Phonexia Voice Activity Detection (VAD) Phonexia Speaker Diarization (DIAR) Phonexia Keyword Spotting (KWS) Phonexia Speech Quality Estimator (SQE) Phonexia Speech Transcription (STT) SIR is a client application cooperating with REST servers. It can be used as a standalone application due to the integrated local REST server. It was…

Phonexia Browser

     Posted on: 2017-05-18

About Phonexia Browser v3 (Browser v3) software that combines the power of speech technologies in a single desktop application. The application automatically  performs visualization of records as well as effective filtration of speech metadata uncovered from the user´s records. Speech technologies implemented: Speaker Identification (SID) Language Identification (LID) Gender identification (GID) Voice Activity Detection (VAD) Speaker Diarization (DIAR) Keyword Spotting (KWS, 10+ languages available) Speech Quality Estimator (SQE) Speech to Text (STT, 10+ languages available) Age Estimation (AGE) Browser v3 is a client application cooperating with Speech Engine v3 (SPE3). It is possible to use it as a client -…

Phonexia Academy

     Posted on: 2017-05-18

About Main idea of the Phonexia Academy is to help partners to understand the market, Phonexia’s products and technologies. Sell more, deliver your projects on time and at the highest quality, and support your clients effectively. We provide following trainings: Phonexia technologies introduction (online video course) Technical Training Essentials (online video course) Technical Training Advanced - 2 courses: Voice Biometrics Course (in-person, 2 days) Speech Analytics Course (in-person, 2 days) In Technical Training Advanced courses, we share best practices, detailed use-cases analysis, and hands on. Both courses are adjusted to our partners’ requests considering their typical projects. You might be…

Phonexia Speech Platform for Government

     Posted on: 2017-05-18

Phonexia Voice Biometrics GOV is a special edition of Phonexia Speech Platform for Government which allows you to understand the nature of audio without having to listen to it. The product helps people to utilize the power of voice biometrics to filter audio and prevent or identify crimes. The technologies reveal automatically WHO, what GENDER, what LANGUAGE is speaking, and many other metadata. The product can be used typically for investigation support, SIGINT or other types of operations. It serves 4 main use-cases: Voice Biometrics - Speaker Search in Archive (Investigation) Voice Biometrics - Speaker Spotting Tactical Voice Biometrics -…

Phonexia Speech Platform for Commerce

     Posted on: 2017-05-18

Phonexia Speech Analytics is a special edition of Phonexia Speech Platform COM which allows you to boost analysis of your call traffic. It is effective solution for commercial, telecom, utilities, financial sector, and other contact centers. It provides 4 main parts: Dialog Analysis, Demographic Information, Script Alignment, Speech Transcription (automatic).   Phonexia Voice Biometrics is a special edition of Phonexia Speech Platform COM which allows you to boost security and enhance customer experience with voice biometrics technologies. It is effective solution for commercial and financial sectors, especially for banks, insurance companies, and call centers. It covers both usecases: Fraud Detection…