Skip to contentSkip to main navigation Skip to footer

Voice Verify Sandbox

End User License Agreement (EULA)

By using Voice Verify Sandbox, you accept all the terms and conditions of EULA. If you do not agree with the terms and conditions of EULA, do not continue using Voice Verify Sandbox.

Phonexia Voice Verify SANDBOX is a cloud appliance accessible via RESTful API that enables testing of our technologies on one’s own recordings. The SANDBOX allows clients to get a feel for the complete Voice Verify solution and simulates a real communication scenario between Voice Verify and another software.

Requirements for using our SANDBOX are:

  • RESTful API knowledge
  • custom SANDBOX link generated by Phonexia

The Sandbox is focused only on voice verification, in which case you expect a certain speaker on the line and ask the question “Is the speaker calling right now really the person you expect him to be?”.

To verify a speaker, Phonexia Speaker Identification technology is used. Speaker Identification uses the power of voice biometry to recognize speakers automatically by their voice with extremely high accuracy. Detailed information about Speaker Identification can be found here.

Sandbox server link

Phonexia provides a custom Sandbox server link for every user. The link may appear as follows:

https://sandbox.phonexia.com/d7v7gjb3ee4gb5

(WARNING – this is a non-functional link serving just as an example, a working link has to be generated by Phonexia)

To get an own Sandbox link, drop a message to [email protected] Sandbox access expires after 30 days.

This is where all the RESTful API requests are sent. In this document there are examples of requests using a https://partner.phonexia.com:8080 server link, this has to be replaced with your unique server link.

Example – instead of

$ curl 'https://partner.phonexia.com:8080/audios' -i -X GET

it has to be changed to

$ curl 'https://sandbox.phonexia.com/d7v7gjb3ee4gb5/audios' -i -X GET

Streams

The Sandbox simulates a real call center environment, where customers speak with the call center agent via a mobile phone and Speaker Identification technology performs real-time analysis for fast voice verification. For technical reasons, Sandbox does not work with a real phone call, but with audio recordings. For this reason, a real-time stream is created from every audio recording file that is about to be processed.

There are two possible ways of creating a stream:

From a Phonexia test audio file

By default, one test speech recording is provided by Phonexia. All available audio files from which to create streams are available with:

$ curl 'https://partner.phonexia.com:8080/audios' -i -X GET

Response body:

[
    "david.wav"
]

Now a stream can be created from this recording so that it can be processed in real-time with Speaker Identification technology:

$ curl 'https://partner.phonexia.com:8080/stream?audioFile=david.wav' -i -X GET

Response body:

{
  "streamId" : "f509581e-9c7a-4838-81f0-1d20c05d99ca"
}

The streamId hash code will be used later for comparison to the chosen speaker. The result will be an answer to the question “Is the speaker speaking in this audio recording the same speaker you chose from the database?”.

From own audio files

To process one’s own audio recordings, it is necessary to upload them to the Sandbox server so that they are ready to be streamed. No audio recordings are saved in the Sandbox.

Uploading can be done with:

$ curl 'https://partner.phonexia.com:8080/stream' -i -X POST --data-binary @'AUDIO_FILE_PATH'

Response body:

{
  "streamId" : "f509581e-9c7a-4838-81f0-1d20c05d99ca"
}

The streamId hash code will be used later for comparison to the chosen speaker. The result will be an answer to the question “Is the speaker speaking in this audio recording the same speaker you chose from the database?”.

Audio recordings requirements

Supported audio formats to be uploaded are:

  • WAVE (*.wav) container including any of:
    • unsigned 8-bit PCM (u8)
    • unsigned 16-bit PCM (u16le)
    • IEEE float 32-bit (f32le)
    • A-law (alaw)
    • ?-law (mulaw)
    • ADPCM
  • FLAC codec inside FLAC (*.flac) container
  • OPUS coden inside OGG (*.opus) container

Other formats must be converted using external tools.

One recording should contain only one speaker.

Speaker enrollment

When audio recordings are ready to be streamed and the streamId hash code is known, the Speaker Identification technology is ready to compare streamed audio file with any speaker from the database.

By default, one speaker is provided by Phonexia:

$ curl 'https://partner.phonexia.com:8080/speakers' -i -X GET

Response body:

[
    "david"
]

To add a new speaker, speaker enrollment has to be done from the stream. This means that their voiceprint is stored in the database. Once the Sandbox is terminated, all voiceprints are deleted. For this, streamId hash code is needed.

$ curl 'https://partner.phonexia.com:8080/enroll?streamId=f509581e-9c7a-4838-81f0-1d20c05d99ca&speakerId=julia' -i -X POST

The speakerId parameter sets the speaker name in the database.

For speaker enrollment, there have to be at least 20 seconds of net speech contained in the audio file, otherwise accuracy may decrease.

Again, available speakers can be checked:

$ curl 'https://partner.phonexia.com:8080/speakers' -i -X GET

Response body:

[
    "david",
    "julia"
]

Speaker verification

The last step is the comparison of the streamed audio recording with any speaker from the database.

$ curl 'https://partner.phonexia.com:8080/compare?streamId=f509581e-9c7a-4838-81f0-1d20c05d99ca&speakerId=david' -i -X GET

Both streamId and speakerId parameters are required.

The audio file used for verification (this does not apply for enrollment purposes, only for comparison) has to contain at least 5 seconds of net speech for reliable verification.

Results comprehension – Examples

Comparison of David’s voice (david.wav) recording with David’s voiceprint

Response body:

{
  "streamId" : "f509581e-9c7a-4838-81f0-1d20c05d99ca",
  "speakerId" : "david",
  "score" : 7.87523,
  "finalResult" : true
}

The score value is 7.87523. As the value is greater than 3, it can be said that this voice belongs to David with extremely high probability.

Comparison of David’s voice (david.wav) recording with Julia’s voiceprint

Response body:

{
    "streamId": "84256d45-ee89-45a4-b99b-2264f10bf527",
    "speakerId": "david",
    "score": -10.741536,
    "finalResult": true
}

The score value is -10.741536. As the value is much smaller than 0, it can be said that this voice does not belong to David with extremely high probability.

RESTful API documentation available with the Sandbox link

The Voice Verify Sandbox link provided by Phonexia can be opened in a web browser. It offers a clear RESTful API documentation with examples and descriptions. It does not add any new information, but it can be used as a quickly accessible clear and short documentation with no additional explanations of the technology behind it.