Skip to contentSkip to main navigation Skip to footer

Voice Verify 1.5.1

This documentation is valid for Voice Verify version 1.5.1. For older versions please visit Older versions.

Phonexia Voice Verify introduction

This section provides information for the integrator and administrator roles of Phonexia Voice Verify. The reader should receive information on the deployment, integration and maintenance of the Phonexia Voice Verify solution.

This document does not describe all the knowledge required for the correct implementation of voice biometrics to a call center.

In this section the following terms are used:

  • Client – the company installing, using and integrating Phonexia Voice Verify
  • Customer – the person or company utilizing voice biometrics technology for voiceprint enrollment or identity verification

Voice Verify utilizes Speaker Identification technology as a voice biometrics system.

To solve the speaker verification problem, two processes are used:

  1. Enrollment or Customer Registration
  2. Verification

During Enrollment (1), the voiceprint of a speaker is created and saved to a database. This baseline voiceprint is later used during all subsequent verifications of the same speaker. For this reason, being sure of a speaker’s identity during enrollment is crucial and needs to be verified by other means. Other parameters like utterance richness or the quality of speech used for enrollment also need to be checked.

The Verification (2) phase takes place multiple times. After the Customer has gone through Enrollment, every subsequent call can verify the Customer through the use of voice biometrics. For this, the Customer’s voice is used again to create an additional voiceprint, which is then compared to the baseline (Enrollment) voiceprint. A comparison of these two voiceprints then results in confirmation of whether they have come from one and same speaker or not.

Voice Verify scalable vs. non-scalable version

Voice Verify can be deployed in two variants:

  1. Non-scalable
  2. Scalable

The main differences are summarized in the following table:

FeatureNon-scalableScalable
Number of servers/VMs needed110+
Number of concurrent calls1-350*unlimited
Vertical scaling
Horizontal Scaling
Public domain needed**
Secured communication***
Accepts SIP calls
Accepts HTTP streaming
Accepts WebSocket streaming
Batch import of voice recordings
WebHook support

*in case a server/virtual machine with more than 50 CPUs is used – see Hardware requirements.
**public domain is needed only for secured communication in case no SSL certificate is owned
***secured communication is optional.

Typically, the scalable version is suitable for huge deployments with more than 350 calls processed in parallel.


Voice Verify deployment

Non-scalable version

Voice Verify is delivered as a single virtual machine. Both on-premise and cloud deployments are supported. We are able to provide the package for the following hypervisors:

  1. VMware
  2. Amazon Web Services
  3. Microsoft Azure

Hardware requirements

Phonexia Voice Verify as a virtual machine needs only specific hardware for successful operation. The HW specification is as follows.

CPU

Phonexia technologies are optimized for INTEL CPUs. Recommended series are

  • INTEL Xeon E5 generation 3 or 4
  • INTEL Xeon Gold
  • INTEL Xeon Platinum

The specific model selection depends on the expected traffic. To cover the peaks in the estimated load on Phonexia Voice Verify, the system needs enough dedicated CPU cores. A rough formula to calculate CPU sizing is that 1 CPU core can handle 7 concurrent calls.

CPU cores are a narrow point for scaling. Other components are perceived by Phonexia as not that crucial or costly.

RAM

RAM required for the smooth operation of Phonexia Voice Verify also depends on the expected traffic.

1GB of RAM for 7 concurrent streams, plus 8GB for the whole system is a sufficient estimation.

Disk storage

There are two virtual disks required by Phonexia Voice Verify – a system disk and data disk.

  • System disk
    • Requires 10 GB
    • Contains:
      • Voice Verify
      • License file
      • Audio Source Profile ( = calibration profile)
  • Data disk
    • Capacity is defined by the number of calls processed in parallel
    • Crucial from a Disaster Recovery (DR) perspective + updates
    • Contains
      • Voiceprints
      • Logs
      • PBX instances database

Logs are created during various activities by Phonexia Voice Verify (mainly API requests) and are deleted after some time (90 days). The amount of logs depends on the traffic.

The basic formula for the estimation of required disk capacity is dependent on the amount of audio processed by Phonexia Voice Verify. This formula is: 1 minute of 1 audio stream with usual usage (2 verification queries per second) creates 100kB of logs.

As an example, one stream running 24 hours a day straight generates 15GB logs during 90 days. This disk capacity is then required to keep all the necessary logs for this stream.

Networking requirements

The Voice Verify virtual machine needs to meet the following requirements:

  • static IP address (typically an IP reservation on DHCP server)
  • has to be reachable for API requests
  • needs to be able to connect to a PBX (in case SIP calls are used)
  • allowed ports
    • TCP 22 – SSH connection
    • TCP 80 – WebSockets, Kibana, Grafana
    • TCP 5060 – SIP
    • TCP 8000 – Voice Verify
    • UDP 20000-20350 – RTP
  • domain requirements
    • the customer chooses a domain name on which he wants to run Voice Verify
    • The required DNS configuration for domain „mydomain.com“ is in the next table:
RecordTypeValue
mydomain.comAPulic IP address
*.mydomain.comCnamemydomain.com

The domain does not necessarily need to exist, it is possible to include the domain name and corresponding URLs into hosts file (Windows C:WindowsSystem32driversetchosts or Linux/Mac /etc/hosts). Example:

<IP_ADDRESS> voiceverify.mydomain.com
<IP_ADDRESS> mydomain.com
<IP_ADDRESS> websocket.mydomain.com
<IP_ADDRESS> elasticsearch.mydomain.com
<IP_ADDRESS> kibana.mydomain.com
<IP_ADDRESS> grafana.mydomain.com

where <IP_ADDRESS> is the IP address of the virtual machine.

Architecture

To obtain more detailed information about the non-scalable Voice Verify infrastructure, please contact Phonexia’s consulting team.

Scalable version

The Voice Verify scalable version needs at least 10 virtual machines (or physical servers) in order to run. The solution consists of several components designed for high availability (for most of the components) and for scalability. Both on-premise and cloud deployments are possible. The deployment process is semi-automatic and it requires cooperation between the customer’s and Phonexia’s DevOps Engineers, who perform deployment to the customer’s environment.

Hardware requirements

The customer defines the maximum expected load for Voice Verify. Phonexia’s DevOps Engineers calculate the server requirements, including the count of virtual machines/servers and specifications for CPU, RAM and HDD.

CPU

Phonexia technologies are optimized for INTEL CPUs. Recommended series are

  • INTEL Xeon E5 generation 3 or 4
  • INTEL Xeon Gold
  • INTEL Xeon Platinum

Example

Configuration for 500 parallel calls:

Virtual machine/server countCPURAMHDD
4x4830 GB
18x81630 GB
3x4430 GB

In total, 25 virtual machines/servers with 172 CPU cores, 332 GB RAM and 750 GB HDD.

Networking requirements

Network communication is secure, all endpoints use certificates on customer defined subdomain. Most services use common HTTP on TCP 80.

  • all servers must have an internal static IP address and they have to be able to communicate with each other
  • one of them must be publicly accessible with a static public IP address
    • the customer chooses a domain name on which he wants to run Voice Verify
    • this domain must be redirected to the static public IP address mentioned above
      The required DNS configuration for domain „mydomain.com“ is in the following table:
RecordTypeValue
mydomain.comAPublic IP address
*.mydomain.comCNamemydomain.com
  • SSH access to virtual machines/servers (for deployment and updates only)
  • all servers must be deployed to the same subnet
  • allowed ports
    • inside the subnet
      • TCP port 2377
      • TCP and UDP port 7946
      • UDP port 4789
    • incoming communication from the public – only to the public IP address defined above (server/virtual machine hosting proxy server)
      • TCP 80 – HTTP
      • custom
        • TCP <custom_port_1> – can be set for WebSocket connection
        • TCP <custom_port_2> – can be set for Voice Verify API
        • by default, both of these run on TCP 80

Architecture

To obtain more detailed information about the scalable Voice Verify infrastructure, please contact Phonexia’s consulting team.

HTTPS

There are two ways of running Voice Verify (scalable and non-scalable) on HTTPS:

  • Use customer’s wildcard certificate
  • Use online-assigned 100% valid certificate. In this case, there are two requirements
    • Virtual machine/server must have access to the internet
    • SSL domain must have a public DNS

In both cases, its necessary to inform Phonexia’s DevOps team before deployment.

API

Once Phonexia Voice Verify is deployed, all its functionality is accessible via API. API integration into the software used in the call center consists of two parts:

  1. giving instructions to Voice Verify (enroll a person, verify, create a back-up,…)
  2. representation of the verification result (verified, not sure, not verified)

Phonexia Voice Verify provides API functionality for several processes.

  • Main functionality – voice verification
  • Support administrative process – PBX connectivity, voice streaming, reports, logging, …
  • Maintenance – back-ups, restore
  • Note that all the endpoints are documented in detail in the API description

Current offline version of API documentation can be downloaded here:

and can be viewed in the Swagger editor – https://editor.swagger.io/

Non-scalable version

The Phonexia Voice Verify API is accessible on the server URL at http://voiceverify.mydomain.com:8000/swagger/ , where mydomain.com is the domain dedicated to Phonexia Voice Verify or assigned in the hosts file – see Networking requirements.

Scalable version

The Phonexia Voice Verify API is accessible on the server URL at https://voiceverify.mydomain.com/swagger/, where mydomain.com is the domain dedicated to Phonexia Voice Verify, defined during deployment.

Voice Verify is, by default, accessible on a standard HTTPS / 443 port, however a custom port can be defined during deployment.

Both versions

Access Management

Phonexia Voice Verify provides limited rest-auth access management based on a token.

Only one user account exists in the system. The Client can login to the system using access credentials (POST /rest-auth/login/). The returned token is used in all follow-up queries.

The token has to be added to the HTTP header as "Authorization: Token <ACCESS_TOKEN>". In case you are using Swagger (more information in API reference chapter) to send your request, click the “Authorize” button located as shown in the following picture:

Now insert the token in the same format Token <ACCESS_TOKEN> and confirm by the “Authorize” button.

The Client can change the password for the user account (POST /rest-auth/password/change/).

The initial access credentials are delivered by Phonexia to the Client. After successful deployment of Phonexia Voice Verify, the Client must change the password. In case of a forgotten password, Phonexia can reset the password (GET /maintenance/reset_password); access to the system is necessary (e.g. VPN).

Voice input

The next step, after deployment of Voice Verify and loading the Swagger API interface, is providing a voice input. For this reason, these options are available:

  • real-time streams
    • SIP calls (not yet available in the scalable version)
    • HTTP streams
    • WebSocket streams
  • offline audio file processing
    • voice recordings (for enrollment only)

Each real-time stream is internally bound to a unique uuid. Using this uuid, enrollment or verification can be called upon a specific voice stream. All voice input options can be combined.

SIP calls

Phonexia Voice Verify uses SIP protocol to register to a PBX and acts as a standard SIP endpoint. Voice Verify is then able to accept SIP calls and process the incoming voice stream.

The PBX (for example Genesys, CISCO, Avaya, Asterisk,…) must be configured to provide a copy of a voice stream coming from a customer and initiate a call to Phonexia. The parameter UUID of each stream serves as an identifier used later for making enrollment and verification requests on this voice stream.

Configuration of the PBX depends on the vendor of the PBX. The Integrator of Phonexia Voice Verify is responsible for this part of configuration.

PBX Connectivity

To allow Phonexia Voice Verify access to live streams, it needs to be registered to a PBX. Phonexia Voice Verify registers/connects as a SIP endpoint to the PBX.

Phonexia Voice Verify keeps a list of available PBX instances in a database. It can connect or disconnect to any of them via an API request.

A PBX instance entry needs to be created in the Phonexia Voice Verify database before such a PBX can be connected. A PBX instance entry is created by (POST /pbx/), listed (GET /pbx/{ID}) or removed (DELETE /pbx/{ID}). All PBX entries can be listed as well via (GET /pbx/).

When a PBX instance entry exists in Phonexia Voice Verify, the connection can be started (= Voice Verify is registered to this PBX) (POST /pbx/{ID}/start) or closed (POST /pbx/{ID}/stop).

When the PBX is connected, Phonexia Voice Verify listens and receives SIP calls. Once such a call is received, its binary content is then redirected to the processing unit. From that point, a so-called stream is created.

Phonexia Voice Verify works with an internal stream identifier – stream_uuid. Such stream_uuid is generated by Voice Verify. After the call is connected, all following API requests related to streams work with this identifier. As PBX has different means of call identification (callid, caller or callee), Call Center SW can ask for stream details to obtain stream_uuid (POST /streams).

Audio requirements

Inside the SIP call, audio (voices) are transmitted via RTP protocol. For more information, see RFC 3550.
Supported RTP Payload types are:

  • 0 (PCMU, Little-Endian, 8000 Hz, 1 channel)
  • 8 (PCMA, Little-Endian, 8000 Hz, 1 channel)

HTTP streams

Sending voice to Voice Verify can also be done via HTTP streaming.

HTTP streaming consists of three steps:

1) Opening a stream – done by POST /api/v2/stream/HTTP endpoint.

Default sampling frequency is 8 000 Hz (a different frequency has to be specified by frequency parameter).

Any “key”:”value” pair(s) can be added into the info field. All of these are optional. Specifically, for WebHook subscription, it is possible to include source_uuid – identifier for WebHook subscription, as follows:

{
  "info": {"source_uuid": "<string>"},
  "frequency": 8000
}

In the response, uuid (unique ID of the stream) is returned. This uuid will later be used for sending voice and enrollment/verification.

2) Sending data (voice) to the stream – using POST /api/v2/stream/HTTP/data/{uuid}.

Only mono-channel streaming is supported. A stream is automatically closed if no data is sent for more than 10 seconds.

During streaming, enrollments/verifications can be requested.

3) Closing the stream – DELETE /api/v2/stream/HTTP/{uuid}.

Endpoint GET /api/v2/status can be used to:

  1. check how many HTTP streams are currently running
  2. check the maximum count of HTTP streams running at the same time

Audio requirements

HTTP – RAW s16le – frequency and number of channels are defined by API request

WebSockets

Non-scalable version

In order to send voice stream to Voice Verify via WebSocket, a connection has to be made first. All WebSocket messages are sent to http://websocket.mydomain.com.

Scalable version

In order to send voice stream to Voice Verify via WebSocket, a connection has to be made first. All WebSocket messages are sent to https://websocket.mydomain.com.

WebSockets are, by default, accessible on a standard HTTPS / 443 port, however a custom port can be defined during deployment.

Both versions

As the first step, a WebSocket connection has to be established. This can be achieved by sending the following WebSocket JSON request:

{
  "event": "connected",
  "protocol": "Call",
  "version": "1.0.0"
}

Voice Verify currently accepts only version 1.0.0.
After successful connection, the following message will appear as a response:

{
  "version": "1.0.0",
  "event": "connected",
  "msg": "Connection established.",
  "status": 200
}

For the sake of keeping the connection alive, Voice Verify sends the following WebSocket message every 5 seconds:

{
  "event": "ping",
  "msg": "Ping.", 
  "status": 100
}

Now, a WebSocket stream can be started. Multiple streams can be sent through one WebSocket connection.
To start a WebSocket stream, send the following request:

{
    "event": "start",
    "sequenceNumber": "1",
    "start": {
        "accountSid": "",
        "streamSid": "<streamSid>",
        "callSid": "",
        "tracks": ["inbound"],
        "mediaFormat": {
            "encoding": "audio/x-mulaw",
            "sampleRate": 8000,
            "channels": 1
        },
        "customParameters": {
            "uuid": "<uuid>",
            "source_uuid": "<string>",
            "additional_parameter_1": "value_1"
        }
    },
    "streamSid": "<streamSid>"
}

Where:

  • sequenceNumber should always be “1”
  • streamSid is a 34 character long string [a-zA-Z0-9] used for pairing purposes
  • mediaFormat contains information about the expected voice stream – these values cannot be changed
    • supports one channel audio with an 8k sample rate
  • customParameters contains “key”:”value” pairs, either required or optional
    • uuid – required – Voice Verify internally recognizes the voice stream by this unique ID in UUID format, must be given and compliant with UUID4 standard
    • source_uuid – optional – identifier for WebHook subscriptions
    • optional – any other information can be added as a “key”:”value” pair – this information will be stored in the stream object and could be used for the searching of this stream
  • accountSid, streamSid, and callSid parameters are optional and are there just for the sake of Twilio protocol support. Leave them blank or ensure they are 34 characters long, otherwise a Validation Scheme Exception will be raised

After successful stream creation, a similar message will appear as a response:

{
    "id": "<streamSid>",
    "uuid": "<uuid>",
    "retries": 0,
    "event": "start",
    "msg": "Stream started.",
    "status": 201
}

The stream times out after 10 seconds of inactivity (no packets with streamSid received). The following message will be sent:

{
    "id": "<streamSid>",
    "uuid": "<uuid>",
    "event": "media",
    "msg": "Request timeout.",
    "status": 408
}

Possible error messages when starting a stream:
Response:

{
    "id": "<streamSid>",
    "uuid": "<uuid>",
    "event": "start",
    "msg": "<MSG>",
    "status":  <STATUS_CODE>
}

Error codes:

STATUS_CODEMSGNOTE
201Stream started. 
403Resource is unavailable.This could happen due to licensing issues (not enough stream slots).
404Stream not found. 
408Request timeout. 
409Stream already exists.Occurs when starting multiple streams with the same UUID.
423Maintenance mode set.Voice verify is in maintenance mode.
500Internal server error. 
502Connection error. 
503Resource is unavailable.This could happen due to licensing issues (not enough technology slots).

After the stream is created successfully, the voice stream should be sent via a similar message:

{
    "event": "media",
    "sequenceNumber": "2",
    "media": {
        "track": "inbound",
        "chunk": "1",
        "timestamp": "128",
        "payload": "<DATA>"
    },
    "streamSid": "<streamSid>"
}

Payload <DATA> are Base64 encoded RTP data (PCMU, Little-Endian, 8000 Hz, 1 channel).
The following voice stream message would look like this:

{
    "event": "media",
    "sequenceNumber": "3",
    "media": {
        "track": "inbound",
        "chunk": "2",
        "timestamp": "158",
        "payload": "<DATA>"
    },
    "streamSid": "<streamSid>"
}

etc. No message is sent back to the client when data is successfully transmitted, to avoid bandwidth overload.
To stop a stream, the following message is expected:

{
    "event": "stop",
    "sequenceNumber": "<N>",
    "streamSid": "<streamSid>",
    "stop": {
        "accountSid": "",
        "callSid": ""
    }
}

Where <N> corresponds to the next logically expected number (last media message would have sequenceNumber = <N-1>. accountSid and callSid are optional for Twilio compatibility only. One may leave them blank or provide 34 character long string IDs.

Successful stream stopping results in the following response:

{
    "id": "<streamSid>",
    "uuid": "<uuid>",
    "event": "stop",
    "msg": "Stream stopped.",
    "status": 204
}

Voice recordings

Voice recordings can be used for enrollment.

  • you can enroll many recordings in one API request
  • more than 20 seconds of speech is required in each recording
  • one speaker only is required in one recording
  • good audio quality

Audio requirements

For calibration and enrollment from a pre-existing database, recordings should be used in these formats:

  • WAVE (*.wav) container including any of:
    • unsigned 8-bit PCM (u8)
    • unsigned 16-bit PCM (u16le)
    • IEEE float 32-bit (f32le)
    • A-law (alaw)
    • μ-law (mulaw)
    • ADPCM
  • FLAC codec inside FLAC (*.flac) container
  • OPUS coden inside OGG (*.opus) container

Other formats are converted using ffmpeg, but it cannot be guaranteed, that the quality of these recordings will be sufficient.

One recording should contain only one speaker.

Enrollment/verification

Once deployment is finished successfully and voice streams are connected, voice biometrics can be used.

Both enrollment and verification actions can be performed on any voice stream. In addition, enrollment can be done offline using voice recordings.

It is possible to list all the current streams (GET /streams) or details of a stream (POST /streams).

Enrollment

A Customer’s voice can be enrolled by two methods:

  1. Enrolling the voice from a current voice stream
  2. Enrolling from a recording including the Customer’s voice (POST /import)

The Customer’s voiceprint is bound to an external_id, which is an arbitrary string (length max 256 characters) defined by the Client. Other SW can always refer to the Customer’s voiceprint via this external_id string.

By enrolling, the customer’s ID (arbitrary string denoted in API as external_id) and the corresponding voiceprint are saved. Phonexia Voice Verify keeps neither recordings nor information on the speech content.

The existence of the voiceprint can be checked (POST /voiceprint), a voiceprint can even be removed (DELETE /leave). As voiceprints are bound to the external_id of a customer, this parameter needs to be provided.

When creating a voiceprint via POST /enroll, Voice Verify provides information about the amount of net speech present in the enrolled voiceprint. The more net speech voiceprint includes, the better.

For mass voiceprint/enrollment processing there are options to extract all the voiceprints (GET /snapshot/generate) or import them back to Phonexia Voice Verify (POST /snapshot/import). The export includes information on voiceprint creation time, external_id, stream_uuid and more. It is useful for automated statistics and confronting enrollment database with other Client systems. Import of snapshots allows existing voiceprints to be overwritten or skipped.

Verification

When a Customer is enrolled, his/her identity can be verified on a current stream. Phonexia Voice Verify always expects the external_id as part of the request, to know whose voice to verify against.

Verification can be done by:

  1. Polling – (POST /verify)
  2. WebHooks

Verification by polling can be requested any time, repeatedly. Especially for passive verification, the frequency of verification can be high e.g. every half of second.

Verification results interpretation

Inside Phonexia Voice Verify the Speaker Identification technology compares the voice from the incoming stream with the enrolled voiceprint of the same customer every time the verification request is sent to Phonexia Voice Verify. As a result, the status of verification and the verification score are provided in the API response with options:

  • not_verified – the questioned voice does not match the enrolled one
  • not_sure – the voices are similar enough to reject the Customer for verification, but are not enough similar to be absolutely sure
  • verified – the questioned voice is the same as the enrolled one

These results are provided, based on the verification score and desired threshold(s). To understand verification scores, please see Speaker Identification chapter.

WebHooks

Voice Verify allows sending HTTP callback (WebHook) for following actions:

  • a voice stream starts/stops (only HTTP or WebSocket streams)
  • an enrollment attempt is made on a voice stream
  • a verification attempt is made on a voice stream

WebHook callback for starting/stopping of a voice stream

To setup a WebHook callback for starting/stopping of a specific voice stream with source_uuid, use POST/api/v2/subscription with the following request body:

{
    "channel_uuid": "<source_uuid>",
    "webhook": "<webhook_URL>",
    "type": "streams",
    "info": {}
}

The URL of the callback must be specified in <webhook_URL> parameter.

It is possible to attach start/stop WebHook callbacks to any HTTP and WebSocket voice stream. Start/stop WebHook callback cannot be used for SIP calls, as it is not possible to define source_uuid in this process.

Upon starting/stopping of the voice stream, all subscribers to the specific source_uuid are notified on the WebHook with following message:

{
    "channel_uuid": "<source_uuid>",
    "type": "stream",
    "status": <status>,
    "payload": {
        "action": "<action>",
        "stream_uuid": "<uuid>"
    }
}

Where <action> can either be added or removed, <status> returns a numerical value of HTTP response code and <uuid> is the unique internal stream identifier.

WebHook callback for enrollment on a voice stream

To setup a WebHook callback for enrollment on a specific voice stream with uuid, use POST /api/v2/subscription with the following request body:

{
    "channel_uuid": "<uuid>",
    "webhook": "<webhook_URL>",
    "type": "enroll",
    "info": {
        "external_id": "<external_id>"
    }
}

Where <uuid> is the internal unique ID of the voice stream, the URL of the callback must be specified in the <webhook_URL> parameter and <external_id> is the unique ID of the created voiceprint.

Voice Verify will then send the enrollment WebHook callback every 1 second:

{
    "channel_uuid": "<uuid>",
    "type": "enroll",
    "payload": {
        "stream_uuid": "<uuid>",
        "external_id": "<external_id>",
        "detail": "<detail>",
        "speech_length": <speech_length>
    },
    "status": <status>
}

Where <detail> can state one of the following:

  • there is not enough net speech to create the enrollment yet
  • enrollment created successfully
  • enrollment with this external_id already exists

speech_length returns the number of seconds of net speech present in the enrollment and <status> returns a numerical value of HTTP response code.

WebHook callback for verification on a voice stream

To setup a WebHook callback for verification on a specific voice stream with uuid, use POST /api/v2/subscription with the following request body:

{
    "channel_uuid": "<uuid>",
    "webhook": "<webhook_URL>",
    "type": "verify",
    "info": {
        "external_id": "<external_id>"
    }
}

Where <uuid> is the internal unique ID of the voice stream, the URL of the callback must be specified in the <webhook_URL> parameter and <external_id> is the unique ID of the voiceprint to verify the current voice stream against.

Voice Verify will then send the verification WebHook callback every 1 second:

{
    "channel_uuid": "<uuid>",
    "type": "verify",
    "status": <status>,
    "payload": {
        "stream_uuid": "<uuid>",
        "external_id": "<external_id>",
        "result": "<verdict>",
        "speech_length": <speech_length>,
        "score": <score>
    }
}

Where <verdict> can state one of the following:

  • there is not enough net speech to make the verification yet
  • voiceprint with external_id does not exist
  • verified
  • not verified
  • not sure

speech_length returns the number of seconds of net speech present in the enrollment, <score> returns the verification score and <status> returns a numerical value of HTTP response code.

WebHook callback removal

Any WebHook callback can be deleted by DELETE /api/v2/subscription.

Voice Verify maintenance

Back-up and restore

Preparation

Phonexia Voice Verify has to enter the Maintenance Mode (GET /lock) and end all running streams (POST /maintenance/force_off) before making a back-up.

API back-up and restore

There are two ways of making a back-up via Voice Verify API:

  1. Voiceprint snapshot
    • lets you export all voiceprints as JSON or a compressed JSON file
  2. Back-up
    • lets you back-up the whole database including:
      • voiceprints
      • Audio Source Profiles
      • PBX instances
      • Voice Verify settings

If the database export was done via API, the restoration of Phonexia Voice Verify can be done as follows:

  1. Put the system in Maintenance Mode (GET /lock)
  2. (Optional) wait till all the running streams are processed for enrollment or verification and then closed (GET /streams)
  3. Close all the open streams (POST /maintenance/force_off) to ensure database stability during import
  4. Import the backed-up database (POST /maintenance/loadbackup) from backup file or the voiceprint snapshot (POST /snapshot/import); successful upload of the file means that the database is put into the state of a creation of a backed-up file. That means that all enrolled voiceprints from the period between back-up used and its restoration are lost.
  5. Stop Maintenance Mode (GET /maintenance/unlock) and enable all system endpoints.

Non-scalable version

Back-up can also be done on the virtualization layer.

As Phonexia Voice Verify is a fully virtual machine, it can easily be backed up by creating a snapshot from the virtual environment. Through this option, both System and Data disks are backed up and in case of disaster the system can be completely recovered.

In case of a disaster or a need to return to the previous backed-up state, a virtual machine can be recreated by running the previously saved snapshot by VMware.

Another option is to back-up the data disk and attach it to restore the backed-up state.

Updates

Reasons for update

  • New version released – feature change

Preparation for the update

  1. Put the system in maintenance mode (POST /lock)
  2. Wait till all running streams finish or force their closure (POST /maintenance/force_off)
  3. (Recommended) make backup of the system

During update, downtime is expected.

Non-scalable version

Phonexia Voice Verify runs on the server and uses an allocated disk for database of all information. Updates are done via replacing the entire Phonexia Voice Verify machine. The new machine is provided by Phonexia. It is crucial to preserve the dedicated disk for maintaining the production data.

Scalable version

Update is done by Phonexia’s DevOps team. Internet connection and SSH access to the public IP address corresponding to mydomain.com is needed.

Logging

Phonexia Voice Verify gathers information about events happening inside. The Client can extract various logs according to his needs. Logs include information about

  • Voice streams
  • Enrollment actions
  • Verification actions
  • Errors
  • Much more…

Logs are indexed by Elasticsearch and are deleted after 90 days due to storage capacity.. Phonexia Voice Verify provides a Kibana tool for visualization and export of logs. The Client can refer to Kibana manual. Login credentials can be obtained from Phonexia’s Pre-Sale/Consulting teams as the monitoring tool requires a deeper understanding of the whole Voice Verify architecture.

For billing purposes, the API provides an endpoint to extract logs of invoiceable actions (GET /logging_aggregate). The Client is obliged by contract to provide Phonexia logs including invoiceable actions per frequency as determined by the contract. Note that this endpoint provides various values, however invoicing is ruled by business terms (see contract or Phonexia Voice Verify pricelist) and only invoiceable actions as defined by contract are used for invoicing.

The best practice is exporting the invoiceable actions once per day (each day include the whole period from the beginning of the billing cycle).

Non-scalable version

The Kibana tool is accessible on http://kibana.mydomain.com.

Scalable version

The Kibana tool is accessible on https://kibana.mydomain.com.

The scalable Voice Verify version also, in addition to standard logs, contains advanced monitoring logs, which are deleted after 24 hours due to storage capacity reasons.

Monitoring

Voice Verify contains an advanced monitoring tool Grafana accessible at http://grafana.mydomain.com (non-scalable) or https://grafana.mydomain.com (scalable). Login credentials can be obtained from Phonexia’s Pre-Sale/Consulting teams as the monitoring tool requires a deeper understanding of the whole Voice Verify architecture.

Calibration

As the calibration process requires in-depth knowledge of Speaker Identification technology, Phonexia takes care of it for its Clients. Please note, that for this step Phonexia needs purpose-bound and limited access to Client data.

Calibration is part of the Proof of Concept or Set Up phase and belongs under Professional Services.

  1. The Client prepares a dataset as specified below
  2. The Client provides data to Phonexia using one of the following options.
    1. Transfer of the dataset to Phonexia premises
    2. VPN access from Phonexia to Client storage with datasets
    3. Business trip of a Phonexia Technical consultant to the Client location
  3. Phonexia prepares a Calibration profile, called the Audio Source Profile (ASP). For the creation of ASP and the correct calibration, a discussion about the use case and CX vs. security preferences needs to take place between the Client and Phonexia.
  4. During the first installation, ASP is provided as part of package

ASP management

When a change requirement arises, Phonexia provides ASP and allows the Client to apply it to the running Phonexia Voice Verify instance (POST /api/v2/maintenance/asp/).

To list all available ASPs, use GET /api/v2/maintenance/asp/. It is possible to obtain a detailed description of a specific ASP via GET /api/v2/maintenance/asp/{uuid} or remove it via DELETE /api/v2/maintenance/asp/{uuid}.

It is possible to store more ASPs, however, only one ASP profile can be used for verifications. This ASP is flagged as active. To get the currently active ASP, use GET api/v2/maintenance/asp/active. To set/unset a specific ASP as active, use POST api/v2/maintenance/asp/{uuid}/active.

The Client is requested to maintain the history and versioning of all datasets and ASPs.

Maintaining accuracy

During the project lifetime it is possible that some components in the Client infrastructure may change. When there is any change in any component providing a channel connection from the Customer to Phonexia Voice Verify, it can affect the accuracy of the system.

In case any component affects the channel changes, Phonexia recommends creating a new evaluation set, making an evaluation (by Phonexia) and utilizing a new calibration profile (POST /maintenance/load_asp/{name}/).

Regular update of the evaluation set is recommended during the lifetime of a project. Evaluation on an updated evaluation set to be done early (by Phonexia).

FAQ

What version of VMware are we using?

In case you encounter problems with running a virtual machine, the root cause may be linked with old versions of these virtualization software. We are currently using VMware 15.

What quality should the audio stream/file follow?

For enrollment, the user should be using a usual device, calling from a normal environment. The user should avoid extensive background noise such as loud music, a street with heavy traffic. The agent should warn the user during the enrolment call if speech quality is not good (audible) enough.

During the verification part of process, if the user cannot be authenticated, the agent should ask the user to move to a quieter environment if possible. In case verification is not possible, the agent should switch to a different authentication method.

In general, audio coming to Voice Verify should follow standard telephony encoding without alteration (i.e. not to be compressed to lossy format and then decompressed back to telephony standard before providing to Voice verify).

Can I use stereo recordings for enrollment?

Voice Verify can enroll users via recordings. Only mono files are accepted. It is up to the user of Voice Verify to manage processes providing appropriate recordings to Voice Verify, which include only the voice of desired speaker.Voice Verify cannot recognize which of the channels on stereo or multi channel recording is to be used for enrollment.

Does Voice Verify provide information about gender, age, language used of a speaker during a verification?

Even though Phonexia supports these technologies, they are predominantly used as part of Platform for Government- Voice Verify is a solution for the commercial segment and for use cases solving user verification. Other information like demographics are not in the scope of Voice Verify.

What is the implementation time?

Installing Voice Verify is matter of day or two. What takes most time is integration into the current infrastructure.

The whole process starting from defining processes, deployment of Voice Verify, configuration, calibration and pulling results to Call Center SW is estimated to take a few weeks. The customer’s internal processes can affect this considerably.

When deploying into a development environment, to test integration and get the feeling of internal use, Voice Verify can be deployed in a week or two.

Can I use Phonexia Voice Verify for Active Authentication?

Active authentication means that Customer is verified at the end of the transaction only via assessing the sample of the Customer’s voice provided at that time. This mode of verification has many limitations and downsides:

– It is easily bypassed by replay attacks

– Speaker changes are not identifable

– Customer needs to spend some time on the authentication process that impacts on their experience.

– Accuracy on a very short phrase might be challenging

Technically, Phonexia Voice Verify can even provide Active Verification. As it is not the prime purpose of the solution, integration might be different (streamed audio is accepted only). Also, the utterance of speakers is expected to be at least 3 seconds for accuracy purposes, which might affect the implementation process.

Does Phonexia Voice Verify provide information about sound quality?

No, the current version of Phonexia Voice Verify does not provide such information.

During both enrollment and verification, the quality is considered from the voiceprint creation perspective. Bad quality of an audio segment means that segment is excluded from voiceprint creation. As a result, the voiceprint is created from those audio segments where quality reaches the minimal required level.

For implementation, if the API returns information about an insufficient utterance length, even though the Customer was speaking, it is an indication that the audio quality is low.

Privacy Preference Center

Necessary

Required cookies required for proper function of Word Press publication platform.

gdpr*, wordpress*,cf7*,wp-settings*,PHPSESSID

Analytics

We are using Google Analytic in Global Site Tag configuration for keeping site content optimized for great user experience. No personal data are sent.

_ga*,_gid