Skip to contentSkip to main navigation Skip to footer

Voice Verify 1.4, 1.4.1

This section provides information for the integrator, and administrator roles of Phonexia Voice Verify. The reader should be able to deploy, integrate, configure, and maintain the Phonexia Voice Verify solution.

This document does not describe all the knowledge required for the correct implementation of voice biometrics to a call center.

In this section the following terms are used:

  • Client – the company installing, using, and integrating Phonexia Voice Verify
  • Customer – the person or company utilizing voice biometrics technology for voiceprint enrollment or identity verification

Voice Verify utilizes the Speaker Identification technology as a voice biometrics system.

To solve the speaker verification problem, two processes are used:

  1. Enrollment, or Customer Registration
  2. Verification

During Enrollment (1), the voiceprint of a speaker is created and saved to a database. This baseline voiceprint is later used during all subsequent verifications of the same speaker. For this reason, being sure of a speaker’s identity during enrollment is crucial and needs to be verified by other means. Other parameters like utterance richness or the quality of speech used for enrollment also need to be checked.

The Verification (2) phase takes place many times. After the Customer has gone through Enrollment, every subsequent call can verify the Customer with the use of voice biometrics. For this, the Customer’s voice is used again to create an additional voiceprint, which is then compared to the baseline (Enrollment) voiceprint. A comparison of these two voiceprints then results in confirmation of whether they have come from the one and same speaker or not.

Phonexia Voice Verify Overview

Phonexia Voice Verify is a server solution running in a VirtualBox or VMware appliance.

Phonexia delivers the appliance where the role of the client/integrator is to prepare and run an environment with VirtualBox/VMware software.

Phonexia Voice Verify includes the following components:

  • SIP connector – providing SIP signaling connectivity
  • RTP handler – receiving and managing RTP streams
  • Phonexia Voice Verify stack – stack of components providing the core service; includes Speaker Identification technology
  • Data storage – a separate disk for customer’s voiceprints, logs, etc.

All the application components are installed on a system disk, while data are stored on a data disk. This is important for updates of the application and back-ups.

Only two data types are kept at the system disk (port 0) – a license file and a Speaker Identification calibration profile.

Phonexia Voice Verify API is accessible on the server URL at {ip_address}:8000/swagger/ where {ip_address} designates the IP address dedicated to Phonexia Voice Verify defined by the Client during deployment.

The SIP connector and RTP handler utilize various port numbers.

Installation – Deployment

Phonexia Voice Verify is delivered as a virtual appliance for VirtualBox/VMware virtualization environment.

When delivered by Phonexia, a virtual appliance includes only one virtual disk:

  • System disk, where Phonexia Voice Verify is installed
  • The second disk needs to be mounted by Client (see below)
  • Data disk, where all the necessary information is stored. This disk is crucial from a Disaster Recovery (DR) perspective. It needs also to be kept during updates. It includes:
    • Customer database (internal ID, voiceprint)
    • Logs
    • PBX instances’ database, …

Prerequisites

  • A server with HW sufficient to manage the load expected during production. See the HW scaling section
  • Oracle VirtualBox/VMware software availability
  • IP address available within the current infrastructure

Installation steps:

  1. Prepare the machine according to the required hardware
  2. Install VirtualBox/VMware
  3. Decide on the IP address and MAC address (optional – can be generated by Phonexia) for Phonexia Voice Verify operations and provide the address to Phonexia
  4. Receive and Import the Phonexia Voice Verify appliance to VirtualBox/VMware software
  5. Before running the virtual machine, set MAC address of virtual machine to the one provided by Phonexia (this step is crucial, virtual machine will bind to IP address thanks to the MAC address)
  6. Attach disk to VirtualBox/VMware. The database of Phonexia Voice Verify is stored here. Inside the VirtualBox/VMware the data disk needs to be on SATA port 1 in VirtualBox/VMware configuration.
  7. Allocate the proper network adapter; the virtual appliance uses the host’s network adapter, and ensure that the network adapter can be reached from the network on allocated IP address – Phonexia recommends using Bridged Adapter
  8. Run the virtual appliance

After running the virtual appliance, Phonexia Voice Verify is accessible on the server IP within a few seconds. The fact that Phonexia Voice Verify is installed is a mandatory but not sufficient step for utilizing voice biometrics. Two other steps – PBX connectivity and Call Center SW integration – also have to take place.

Configuration

Configuration is done by Phonexia. The delivered VirtualBox/VMware appliance already includes all necessary parameters.

SIP documentation/requirements reference

Phonexia Voice Verify uses a SIP protocol to connect to the PBX. Phonexia Voice Verify then appears as one client for the PBX.

The PBX must be configured to provide a copy of a stream coming from a customer and initiate a call to Phonexia. The parameter UUID serves as an identifier used later for making Enrollment and Verification actions on this audio stream.

Configuration of the PBX depends on the vendor of the PBX. The Integrator of Phonexia Voice Verify is responsible for this part of configuration. Phonexia aims to support a variety of PBX solutions. Due to the variety of PBX providers, Phonexia is not able to support all of them, and the party taking care of the PBX has to have knowledge about that particular PBX product.

Call Center SW integration

When Phonexia Voice Verify is installed, all the functionality is accessible via API. The software used in the call center can then start requesting the API for enrollments, verifications, and other actions. It is up to the vendor of this Software to modify it with the aim of sending requests and enabling call center agents to perform the actions necessary for the verification process using voice biometrics.

During the PoC or production set up, Phonexia provides support to define the verification process and visualize the verification results.

Feature Description

Phonexia Voice Verify provides functionality for several processes.

  • Main functionality – Voice Verification
  • Support administrative process – PBX connectivity, reports, logging, …
  • Maintenance – backups, restore
  • Note that all the endpoints are documented in detail in the API description.

Prerequisites

  • Phonexia Voice Verify has been successfully installed in the client environment
  • PBX has been configured according to Phonexia Voice Verify requirements
  • Call Center SW has been integrated to utilize the voice biometrics value by showing the verdicts of verification

Phonexia delivers the Phonexia Voice Verify package. Other tasks need to be done by other party/parties, such as the PBX/CC SW vendor, Client, or whomever is responsible for the call center infrastructure.

PBX Connectivity

To allow Phonexia Voice Verify access to live streams, it needs to be connected to PBX. Phonexia Voice Verify connects as a SIP endpoint to the PBX.

Phonexia Voice Verify keeps a list of possible PBX instances in a database. It can connect or disconnect to any of them on API request.

The Client needs to create the PBX instance entry in Phonexia Voice Verify database before such a PBX can be connected. A PBX instance entry can be created (POST /pbx/), listed (GET /pbx/{ID}) or removed (DELETE /pbx/{ID}). All PBX entries can be listed as well (GET /pbx/).

When a PBX instance entry exists in Phonexia Voice Verify, the connection can be started (POST /pbx/{ID}/start) or closed (POST /pbx/{ID}/stop).

When the PBX is connected, Phonexia Voice Verify listens and receives SIP calls. Once such a call is received, its binary content is then redirected to the processing unit. From that point, a so-called stream is created.

Phonexia Voice Verify work with internal stream identifier – stream_uuid. Such stream_uuid is generated by SIP connector component of Phonexia Voice Verify. After the call is connected, all following API requests related to streams work with this identifier. As PBX has different means of call identification (callid, caller or callee), Call Center SW can ask for stream details to obtain stream_uuid (POST /streams).

HTTP streams

Since Voice Verify version 1.4, sending voice to Voice Verify can be done via HTTP streaming as well. This method is a substitute to sending voice via SIP (RTP) protocol. Both streaming methods can be used in parallel.

HTTP streaming consists of three steps:

  1. Opening a stream – done by POST /api/v2/stream/HTTP endpoint.

    Default sampling frequency is 8 000 Hz (a different frequency has to be specified by frequency parameter).

    In the response, uuid (unique ID of the stream) is returned. This uuid will later be used for sending voice and enrollment/verification.
  2. Sending data (voice) to the stream – using POST /api/v2/stream/HTTP/data/{uuid}.

    Only mono-channel streaming is supported. Stream is automatically closed if no data is sent for more than 10 seconds.

    During streaming, enrollments/verifications can be requested.
  3. Closing the stream – DELETE /api/v2/stream/HTTP/{uuid}.

Endpoint GET /api/v2/status can be used to:

  1. see how many HTTP streams are currently running
  2. check the maximal count of HTTP streams running at the same time

Voice Biometrics

When Phonexia Voice Verify is connected to the PBX, voice biometrics can be used.

Enrollment

A Customer’s voice can be enrolled by two methods:

  • During a SIP call to Phonexia Voice Verify, enrolling the voice from a current stream (POST /enroll); it is possible to list all the current streams (GET /streams) or details of a stream (POST /streams).
  • Enrolling from a recording including the Customer’s voice (POST /import)

The Customer’s voiceprint is bound to external_id, which is an arbitrary string (length max 256 characters) defined by the Client. Other SW can always refer to the Customer’s voiceprint via this external_id string.

By enrolling, identificatory of customer (arbitrary string denoted in API as external_id) and the corresponding voiceprint is saved. Phonexia Voice Verify keeps neither recordings nor information about the content of a speech.

Verification

When a Customer is enrolled, his/her identity can be verified on a current stream (POST /verify). Phonexia Voice Verify always expects the external_id as part of the request, to know whose voice to verify.

Verification can be requested any time, repeatedly. Especially for passive verification, the frequency of verification can be high e.g. every half of second.

Removal

Enrollments can also be removed (DELETE /leave) when they are not required for a Customer.

Back-ups and Restore

Phonexia Voice Verify has options for how to create back-up for disaster recovery and restore.

  • On a virtualization layer
  • Using Phonexia Voice Verify’s API

Backup

As Phonexia Voice Verify is a fully virtual machine, it can easily be backed up by creating a snapshot from the virtual environment. Phonexia recommends putting the Phonexia Voice Verify to Maintenance Mode (GET /lock) and ending all running streams (POST /maintenance/force_off) before taking a snapshot. Through this option, both System and Data disks are backed up and in case of disaster the system can be completely recovered.

The other option is using the API to export the database. Phonexia Voice Verify provides an endpoint to do this safely. Note that license file and Speaker Identification calibration profiles are not backed up. As Phonexia provides both the license file and calibration profile as part of the installation image (first set up), there is an integrity process in place.

Full export of a database can be done only in Maintenance Mode. The backup process is:

  1. Put the system in Maintenance Mode (GET /lock); this disables the opportunity to receive new streams for biometry processing
  2. (Optional) wait till all the running streams are processed for enrollment or verification and then closed (GET /streams)
  3. Close all the open streams (POST /maintenance/force_off) to ensure database stability during export
  4. Export the database (GET /maintenance/backup) to backup. Backup is returned as a file via API. This file is to be securely saved by the Client.
  5. Stop Maintenance Mode (GET /maintenance/unlock)

Restore

In case of a disaster or in the need of returning to the previous backed-up state, a virtual machine can be recreated by running the previously saved snapshot by VirtualBox/VMware.

If the database export was done via API, the restoration of Phonexia Voice Verify can be done as follows:

  1. (Optional, if the virtual machine with VirtualBox/VMware application has been lost) make a full install of Phonexia Voice Verify according to the installation process; it is necessary to use the installation package provided by Phonexia with the latest calibration profile
  2. Put the system in Maintenance Mode (GET /lock)
  3. (Optional) wait till all the running streams are processed for enrollment or verification and then closed (GET /streams)
  4. Close all the open streams (POST /maintenance/force_off) to ensure database stability during import
  5. Import the backed-up database (POST /maintenance/loadbackup) from backup file; successful upload of the file means that the database is put into the state of a creation of a backed-up file. That means that all enrolled voiceprints from the period between backup used and its restoration are lost.
  6. Stop Maintenance Mode (GET /maintenance/unlock) and enable all system endpoints.

Voiceprint handling

Phonexia Voice Verify enables several options of how to work with enrolled voiceprints in a database. These options enable various use cases.

The basic process uses biometrics, as specified in the respective section. This is done by enrolling from a stream (POST /enroll) or recording (POST /import) and verifying the voice from the stream (POST /verify).

Voiceprint existence can be checked (POST /voiceprint), a voiceprint can be even removed (DELETE /leave). As voiceprints are bound to the external_id of a customer, this parameter needs to be provided.

For mass voiceprint/enrollment processing there are options to extract all the voiceprints (GET /snapshot/generate) or import them back to Phonexia Voice Verify (POST /snapshot/import). The export includes information about voiceprint creation time, external_id, stream_uuid and more. It is useful for automated statistics and confronting enrollment database with other Client systems. Import of snapshots enable to overwrite or skip existing voiceprints.

Logging

Phonexia Voice Verify gathers information about events happening inside. The Client can extract various logs according to his needs. Logs include information about

  • All SIP calls made
  • All enrollment actions
  • All verification API calls
  • Errors
  • Much more…

Logs are indexed by Elasticsearch. Phonexia Voice Verify provides a Kibana tool for visualization and export of logs. The Kibana tool is accessible on {ip_address}:5601, where {ip_address} designates the IP address dedicated for Phonexia Voice Verify defined by the Client during deployment. The Client can refer to Kibana manual.

To log in into Kibana, user account has to be created via POST /api/v1/maintenance/elasticuser. Each user has his ID, username, password and it is possible to assign a note to each user (for example for e-mail address, etc.). To edit user’s information, PUT /api/v1/maintenance/elasticuser{id} is used. For removal of the user, DELETE /api/v1/maintenance/elasticuser/{id} is used.

All these changes (adding new user, editing user, removing user) do not take place until you submit changes by GET /api/v1/maintenance/elasticuser/commit. This can only be done in maintenance mode.

Finally, you can list all Kibana users by GET /api/v1/maintenance/elasticuser.

Logs are deleted after 90 days due to storage capacity.

For billing purposes, the API provides an endpoint to extract logs of invoiceable actions (GET /logging_aggregate). The Client is obliged by contract to provide Phonexia logs including invoiceable actions in a frequency as determined by the contract. Note that this endpoint provides various values, however invoicing is ruled by business terms (see contract or Phonexia Voice Verify pricelist) and only invoiceable actions as defined by contract are used for invoicing.

Calibration technically

As the calibration process requires in-depth knowledge of Speaker Identification technology, Phonexia takes care of it for its Clients. Please note, that for this step Phonexia needs purpose-bound and limited access to Client data.

Calibration is part of the Proof of Concept or Set Up phase and belongs under Professional Services.

  1. The Client prepares a dataset as specified below
  2. The Client provides data to Phonexia using one of the following options.
    1. Transfer of the dataset to Phonexia premises
    2. VPN access from Phonexia to Client storage with datasets
    3. Business trip of a Phonexia Technical consultant to the Client location
  3. Phonexia prepares a Calibration profile, called the Audio Source Profile (ASP). For the creation of ASP and the correct calibration, a discussion about the use case and CX vs. security preferences needs to take place between the Client and Phonexia.
  4. During the first installation, ASP is provided as part of package, together as a separate file
  5. In a need of change, Phonexia provides ASP and allows the Client to apply it to the running Phonexia Voice Verify instance (POST /api/v1/maintenance/load_asp/{name_of_ASP}/)

The Client is requested to maintain the history and versioning of all datasets and ASPs.

Getting information about Phonexia Voice Verify

There are ways to find out what is happening inside Phonexia Voice Verify. Even though many of the options were specified in various features above, a list of useful endpoints follow.

  1. Status of the Phonexia Voice Verify system, including information about the PBX connection, running streams and others (GET /status)
  2. Information about PBXs registered to Phonexia Voice Verify (GET /pbx/ or GET /pbx/{id})
  3. List of current streams (GET /streams)
  4. Details of a stream; especily, when Call Center SW has the call identifiers from PBX, finding related stream_uuid can be done (POST /streams)
  5. Information about voiceprints enrolled to Phonexia Voice Verify (POST /voiceprint or GET /snapshot/generate)
  6. Logging of various invoiceable actions via Kibana tool (see feature section Logging above)

Access Management

Phonexia Voice Verify provides limited rest-auth access management based on a token.

Only one user account exists in the system. The Client can login to the system using access credentials (POST /rest-auth/login/). The returned token is used in all follow-up queries.

The token has to be added to HTTP header as "Authorization: Token <ACCESS_TOKEN>". In case you are using Swagger (more information in API reference chapter) to send your request, click the “Authorize” button located as shown in the following picture:

Now insert the token in the same format Token <ACCESS_TOKEN> and confirm by the “Authorize” button.

The Client can change the password for the user account (POST /rest-auth/password/change/).

The initial access credentials are delivered by Phonexia to the Client. After successful deployment of Phonexia Voice Verify the Client must change the password. In case of a forgotten password, Phonexia can reset the password (GET /maintenance/reset_password); access to the system is necessary (e.g. VPN).

Voice processing best practices

All the features of Phonexia Voice Verify can be combined in many ways. For flawless implementation to the Client infrastructure, the best practices follow.

Phonexia does not deliver changes to the Call Center Software (CC SW), but only the Phonexia Voice Verify solution. Integration of the following process is up to the CC SW and PBX vendors.

Enrollment process

  1. An incoming call to the call center and audio stream coming from the Customer side is sent to Phonexia Voice Verify; callid (defined by PBX) is sent in the metadata of the SIP call, as well as to the CC SW.
  2. The CC SW receives a call with callid (and possibly caller, callee) identifier(s). Based on other metadata (e.g. phone number) the customer is identified; external_id is set by Client; any existing identifier from CC SW can be used (string up to 256 characters).
  3. The CC SW obtains Phononexia’s stream_uuid identifier (POST /streams, callid to be provided as part of request).
  4. The CC SW asks if the Customer already has a voiceprint created (POST /voiceprint)
  5. If the customer does not yet have a voiceprint, the CC SW GUI displays the possibility of enrollment to the agent
  6. The Agent has the usual conversation with the Customer. The Customer’s identity is verified by the usual means, as Knowledge Based Authentication (KBA). The Customer’s voice is processed by Speaker Identification technology in the background for the whole call duration; no artefacts are saved yet.
  7. The Agent asks the Customer to provide consent for his/her voice usage for Voice Biometrics (where legislation requires);
    1. after consent is given, the Agent triggers the action of enrollment on GUI, e.g. by clicking the button; the CC SW requests customer enrollment (POST /enroll) together with stream_uuid and external_id.
    2. If consent is not given, the Agent marks the Customer as not registered for enrollment; Voice Verify is not requested for enrollment, thus no information is saved about the Customer or the audio stream, and no voiceprint is created.
  8. Phonexia Voice Verify saves the Customer’s voiceprint together with an identificatory external_id to the database (if it was created)

Verification process

  1. An incoming call to the call center and audio stream coming from the Customer side is sent to Phonexia Voice Verify; callid (defined by PBX) is sent in the metadata of the SIP call, as well as to the CC SW.
  2. The CC SW receives a call with callid (and possibly caller, callee) identifier(s). Based on other metadata (e.g. phone number) the customer is identified; external_id is set by Client; any existing identifier from CC SW can be used (string up to 256 characters).
  3. The CC SW obtains Phononexia’s stream_uuid identifier (POST /streams, callid to be provided as part of request).
  4. The CC SW asks if the Customer already has a voiceprint created (POST /voiceprint)
  5. If the Customer does have a voiceprint, the GUI displays this information to the Agent
  6. The Agent has the usual conversation with the Customer. The CC SW starts requesting Phonexia Voice Verify for identity verification (POST /verify) with stream_uuid and external_id as parameters. The frequency of request depends on the Client’s expectation, 500 ms is recommended.
  7. Based on verification, the result is displayed on the Agent’s screen

API reference

An API description is available on the server URL at {ip_address}:8000/swagger where {ip_address} designates the IP address dedicated to Phonexia Voice Verify defined by the Client during deployment.

Security 

As Phonexia Voice Verify is designed to run on promise of the Client, it does not include encryption of API calls and answers It is up to Client to secure connectivity between Phonexia Voice Verify and other components iside internal infrastructure.

Verification results interpretation

Inside Phonexia Voice Verify the Speaker Identification technology compares the voice from the incoming stream with the enrolled voice of the same Customer every time the verification request is put to Phonexia Voice Verify. As a result, the status of verification is provided in the API response with options:

  • not_verified – the questioned voice does not match the enrolled one
  • not_sure – the voices are similar enough to reject the Customer for verification, but are not enough similar to be absolutely sure
  • verified – the questioned voice is the same as the enrolled one

These results are provided, based on the verification score and desired Threshold(s). See the section Calibration.

The authentication process to the call center might then be set up as follows:

  1. after the call is connected to the CC SW, the Customer’s identity is determined. How that is done, depends on the Client.
  2. When the CC SW knows the identity, it can start requesting Phonexia Voice Verify for verification on a call provided to an Agent. The trunked stream of the call is already provided to Phonexia Voice Verify by the PBX.
  3. The Agent starts the dialogue with the Customer to resolve a Customer query. The Customer speech is processed inside Phonexia Voice Verify.
  4. The CC SW starts repetitive queries for the verification of the current customer
  5. The GUI displays traffic lights to the Agent; various colors have various meanings:
    1. AMBER: the system cannot verify the Customer. The Agent is requested to continue dialogue to obtain more customer speech sample
    1. GREEN: the Customer has been verified – the Agent can continue providing information to the Customer, make transaction… If there is another factor of verification utilized (recommended by Phonexia), then also all possible other factors to be checked for verification (KBV, …)
    1. RED: The Customer was rejected; a different authentication process is to be initialized based on Client risk processes

As Phonexia Voice Verify monitors Customer identity all the time, the recommended API usage is as follows:

  1. Verification is kept for the whole call duration: After the Customer has been verified (GREEN), during the call the system provides a RED symbol again. This signals that the speaker has changed on the Customer side. The Agent should take another action to verify the Customer’s identity and stop the provision of service or information. A visualized notification to the Agent should occur (RED status indication)
    It is possible to request the Customer to provide more speech

Integration Checklist

For the minimum successful integration of Phonexia Voice Verify, the following technical activities need to done:

  • Phonexia Voice Verify deployed
  • PBX configured – audio coming from Customer is trunked to Phonexia Voice Verify
  • integration with Contact Center software completed – Call Center SW sends requests for enrollments, receives verifications results, and other activities according to VoiceVerify API availability
  • Regular (e.g. monthly) extraction of invoiceable actions is in place & automated
  • Backup of Phonexia Voice Verify database is in place & automated

Hardware requirements

Phonexia Voice Verify as a virtual appliance needs only specific hardware for successful operation. The HW specification is as follows.

During PoC or Set Up phase Phonexia provides support on how to build optimal sizing.

CPU

Phonexia technologies are optimized for INTEL CPUs. Recommended series are

  • INTEL Xeon E5 generation 3 or 4
  • INTEL Xeon Gold
  • INTEL Xeon Platinum

Specific selection of model depends on expected traffic. To cover the peaks in the estimated load on Phonexia Voice Verify, the system needs enough dedicated CPU cores. A rough formula to calculate CPU sizing is that 1 CPU core can handle 7 concurrent calls.

CPU cores are a narrow point for scaling. Other components are perceived by Phonexia as not that crucial nor costly.

Memory

RAM dedicated for the smooth functioning of Phonexia Voice Verify also depends on the expected traffic.

1GB of RAM for 7 concurrent streams, plus 4GB for the whole system is a sufficient estimation.

Disk storage

There are two virtual disks required by Phonexia Voice Verify – an system disk and data disk.

The system disk requires 0.5GB.

Data disk capacity is defined by logs saved. Logs are created during various activities by Phonexia Voice Verify (mainly API requests) and are deleted after some time (90 days). The amount of logs depends on the traffic.

Basic formula for estimation of required disk capacity is dependent on an amount of audio processed by Phonexia Voice Verify. Ths formula is 1 minute of 1 audio stream with usual usage (2 verification queries per second) creates 100kB of logs.

As an example, one stream running 24 hours a day straight generates 15GB logs during 90 days. This disk capacity is then required to keep all the necessary logs for this stream.

Audio requirements

Inside of the SIP call, audio (voices) are transmitted via RTP protocol. For more information, see RFC 3550.
Supported RTP Payload types are:

  • 0 (PCMU, Little-Endian, 8000 Hz, 1 channel)
  • 8 (PCMA, Little-Endian, 8000 Hz, 1 channel)
  • 10 (L16, Little-Endian, 44100 Hz, 2 channels)
  • 11 (L16, Little-Endian, 44100 Hz, 1 channel)
  • 35 (L16, Little-Endian, 8000 Hz, 2 channels)
  • 36 (L16, Little-Endian, 8000 Hz, 1 channel)

For calibration and enrollment from a pre-existing database, recordings should be used in these formats:

  • WAVE (*.wav) container including any of:
    • unsigned 8-bit PCM (u8)
    • unsigned 16-bit PCM (u16le)
    • IEEE float 32-bit (f32le)
    • A-law (alaw)
    • ?-law (mulaw)
    • ADPCM
  • FLAC codec inside FLAC (*.flac) container
  • OPUS codec inside OGG (*.opus) container

Other formats are converted using ffmpeg, but it cannot be guaranteed, that the quality of these recordings will be sufficient.

One recording should contain only one speaker.

Updates

Phonexia Voice Verify runs on the server and uses an allocated disk for database of all information. Updates are done via replacing the hole Phonexia Voice Verify appliance. The new appliance is provided by Phonexia. It is crucial to preserve the dedicated disk for maintaining the production data.

Reasons for update

  • New version released – feature change

Process of the update

  1. Put the system in maintenance mode (POST /lock)
  2. Wait till all running streams finish or force their closure (POST /maintenance/force_off)
  3. (Recommended) make backup of the system (GET /maintenance/backup)
  4. Turn the appliance off from the VirtualBox/VMware environment
  5. Detach data disk
  6. Remove the virtual appliance from the VirtualBox/VMware
  7. Add the updated version of the Voice Verify appliance to the VirtualBox/VMware
  8. Attach data disk
  9. Start the appliance from the VirtualBox/VMware; Phonexia Voice Verify will be available within a few seconds

Maintenance

Phonexia Voice Verify system is designed to require the minimum of maintenance. Phonexia recommends a few steps to be included in client processes.

Health check

Regular queries on API to check if the system is fully operational (GET /status).

System Backup

For disaster recovery purposes a full backup of system is recommended. See Back-ups and Restore section above to check the process of doing this.

Voiceprint integrity

Phonexia recommends keeping the actual state of voiceprints in Phonexia Voice Verify system. When a Customer terminates their contract with the Client and when Customer data are removed from the Client’s CRM or Call Center SW, the Customer voiceprint should be also removed from the Phonexia Voice Verify database (DELETE /leave).

Maintaining accuracy

During the project lifetime it is possible that some components in the Client infrastructure may change. When there is any change in any component providing a channel connection from the Customer to Phonexia Voice Verify, it can affect the accuracy of the system.

In case any component affects the channel changes, Phonexia recommends creating a new evaluation set, making an evaluation (by Phonexia) and utilizing a new calibration profile (POST /maintenance/load_asp/{name}/).

Regular update of the evaluation set is recommended during the lifetime of a project. Evaluation on an updated evaluation set to be done early (by Phonexia).

Phonexia support

Phonexia provides maintenance and support for Phonexia Voice Verify. Please see contract for details.

Licensing

Licensing is a business topic in general. The following section describes technical perspective only.

Phonexia provides a license for a limited time. The license is represented by a licensing file.

The license is included in the installation package. For cases when a license needs to be prolonged or replaced, the Client can provide a new license file to Phonexia Voice Verify (POST /maintenance/load_license)

FAQ

What version of VirtualBox/VMware are we using?

In case you encounter problems with running a virtual machine, the root cause may be linked with old versions of these virtualization software. We are currently using VirtualBox 6.1 and VMware 15.

What quality should the audio stream/file follow?

For enrollment, the user should be using usual device, calling from normal environment. User should avoid extensive noise in a background like loud music, street with heavy traffic. Agent should warn the user during enrolling call, if quality of speech is not good (audible) enough.

During verification part of process, if user cannot be authenticated, the agent should ask the user to move to quieter environment if possible. In case that verification is not possible, agent should switch to different authentication method.

In general, audio coming to Voice Verify should follow standard telephony encoding without alteration (i.e. not to be compressed to lossy format and then decompressed back to telephony standard before providing to Voice verify).

Can I use stereo recordings for enrollment?

Voice Verify can enroll users via recordings. Only mono files are accepted. It is up to user of Voice Verify to manage processes providing appropriate recordings to Voice Verify, which include only voice of desired speaker.Voice verify cannot have ability to recognize which of the channels on stereo or multi channel recording is to be used for enrollment.

Does Voice Verify provide information about gender, age, language used of a speker during a verification?

Even though Phonexia supports these technologies, they are predominantly used as part of Platform for Government- Voice Verify is solution for commercial segment and for use cases solving user verification. Other information like demographics are not in scope of Voice Verify.

What is the implementation time?

Installing Voice Verify is matter of day or two. What takes most of time is integration into current infrastructure.

Whole process starting from defining processes, deployment of Voice Verify, configuration, calibration and pulling results to Call Center SW is estimated For few weeks. Internal processes of customer affect this a lot.

When deploying into development environment, to test integration and get the feeling of using internally, Voice Verify can be deployed in a week or two.

Can I use Phonexia Voice Verify for Active Authentication?

Active authentication means that Customer is verified at the end of the transaction only via assessing the sample of Customer’s voice provided at that time. This way of verification has many limitations and downsides:

– It is easily bypassed by replay attacks

– Speaker changes are not indentifable

– Customer needs to spend some time by authentication process affecting his experience.

– Accuracy on very short phrase might be challenging

Technically, Phonexia Voice Verify can provide even Active Verification. As it is not the primal purpose of the solution the integration might be different (streamed audio is accepted only). Also the utterance of speakers is expected to be min 3 seconds for accuracy purpose, which might affect the implementation process.

Does Phonexia Voice Verify provide information about sound quality?

No, the current version of Phonexia Voice Verify does not provide such an information.

During both enrollment and verification, the quality is considered from the voiceprint creation perspective. Bad quality of audio segment causes that segment to be excluded from voiceprint creation. As a result, the voiceprint is created from these audio segment with quality reaching minimal necessary level.

For implementation, if API returns information about not sufficient utterance length even though Customer was speaking, it is an indication that the audio quality is low.

Privacy Preference Center

Necessary

Required cookies required for proper function of Word Press publication platform.

gdpr*, wordpress*,cf7*,wp-settings*,PHPSESSID

Analytics

We are using Google Analytic in Global Site Tag configuration for keeping site content optimized for great user experience. No personal data are sent.

_ga*,_gid