FAQ Phonexia Browser

Q: What operating systems can your application run on?

Our technologies are prepared to run on both Windows and Linux OS.

For more details of the supported operating systems as well as recommended HW setup, see Recommended OS and HW

Q: What are the supported audio formats?

Formats supported directly and natively are:

WAVE (*.wav) container including any of:
- unsigned 8-bit PCM (u8)
- unsigned 16-bit PCM (u16le)
- IEEE float 32-bit (f32le)
- A-law (alaw)
- µ-law (mulaw)
- ADPCM
FLAC codec inside FLAC (*.flac) container
OPUS codec inside OGG (*.opus) container

Other audio formats must be converted to one of those natively supported using external tools.
SPE server can be configured do this conversion automatically in background, see Understand SPE audio converter article.

Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X. Example of usage:

FFmpeg

ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav

This command converts any supported format/codec audio file to normalized WAV audio format in 16-bit PCM little-endian as it is the default system. For more parameters please check FFmpeg manual pages.

SoX

sox <source_audio_file_name> -b 16 <output_audio_base_name>.wav

Number of bits defined by -b parameter must be specified.

Q: How to fix Error 1007: Unsupported audio format?

Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? .

But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org

Then continue as described below:

Using Phonexia Browser with embed SPE

Open the Browser configuration dialog by click on button “Settings” located in tool ribbon. Select tab “Speech Engine” and configure SPE as described in documentation. Don’t forget select checkbox “Enable audio converter”.

Using SPE as service/daemon

Open file settings\phxspe.properties using standard text editor. Then change the following line in “phxspe.properties” to enable background conversion:

audio_converter.enabled = false # change it to 'true'

Please check if the conversion tools configured below this line in phxspe.properties are configured properly. Here is an example of configuration for ffmpg:

# Set converter command
# %1 is for input file
# %2 is for output file
ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2
# sox example:
# audio_converter.command = sox %1 %2

Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser.

Q: What languages do you offer?

It depends on the technology.

Phonexia Language Identification (LID) is pre-trained for 60+ languages.

Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more.

Q: What languages are supported by LID?

A: Please see List of supported LID Languages. For more details, see LID technology documentation.

Q: How to fix the Error 1013: Unsupported: Server does not support authentication with token?

A: Please check SPE subdirectory ./settings for configuration files.

If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive:
server.enable_authentication_token = false
In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section “Basic authentication“
If you would like to play with “pure” daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by phxadmin utility or can be created from ./data/phxspe.properties.default template file.
1. Copy template file to ./settings directory
2. Rename it to phxspe.properties
3. Check for server.enable_authentication_token directive and setup it as needed.
4. Restart phxspe

Basic installation steps are described in ./doc/INSTALL.html document.

Q: What languages are supported by KWS?

A: Please see List of supported KWS Languages. For more details, see KWS technology documentation.

Q: What languages are supported by STT?

A: Please see List of supported STT Languages. For more details, see STT technology documentation.

Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…, unable to connect to the SPE server, unable to start the localhost…)

A:

Windows:

Open terminal in folder where PhxBrowser.exe is located (hold Shift and click right mouse button in free space in windows explorer and select “open command window here”)
Run PhxBrowser software with command:

        PhxBrowser.exe /spe-debug /spe-output

PhxBrowser software will start with “SPE output” tab which shows the debug output of SPE

Linux:

Run PhxBrowser software in terminal with command:

        ./PhxBrowser --spe-debug --spe-output

PhxBrowser software will start with ” SPE output” tab which shows debug output of SPE

Q: Why does the system show high score (>90%) even for non-targets?

A: Threshold for score isn’t set up correctly. Adjust speaker score sharpness value to calibrate the recalculation.

Please see Calibration in technology documentation.

Q: What do LLR, LR and score mean?

A: These abbreviations mean the following:

LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf).
LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf).
Percentage (normalised) score – commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too high (typically for some non-adapted installations). Interval <0;100> (or sometimes <0;1>), in %. The higher the score, the better the match.

Q: I can’t manage to run Phonexia Browser software. I always get an error.

I always get the same error messages:

unable to connect to the SPE
unable to start the localhost: giving up and kill the localhost.

A: This error may happen if the initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server.

You can fix this by doing the following:

Increase timeout in Settings > Speech Engine tab > First connection timeout
Use fewer instances of technologies, thus letting the Speech Engine to start faster
Use smaller models of technologies

Q: We prefer USB dongle but without the USB storage

A: We don’t provide USB without memory storage, possible solutions are:

establish security directives related to work with the USB dongle (persons allowed to, in/out memory scan check),
use HW based licensing,
use license server.

Q: I am getting the error message “Your license is not for this application.”

A: Check your license file (license.dat) by opening it in Notepad.

Make sure the license contains records for all required modules.

See Licensing article for additional information

Q: What are the requirements for SID evaluation dataset?

For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset.

SID dataset (minimum requirements):

To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully.

The requirements are:

50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*)
1+ minute of net speech in each recording (i.e. usually 2+ minutes recording length)
only one speaker in each recording
wide variety of gender and age is recommended
recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution)
audio files should be mono, lin16 format, 8 kHz+ sample rate

*Note: splitting single recording into multiple shorter recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times.
In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results.

Warning: Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it’s very important to prepare the data carefully.

See SID Evaluation for more details

Q: What operating systems can your application run on?

Q: What are the supported audio formats?

Q: How to fix Error 1007: Unsupported audio format?

Using Phonexia Browser with embed SPE

Using SPE as service/daemon

Q: What languages do you offer?

Q: What languages are supported by LID?

Q: How to fix the Error 1013: Unsupported: Server does not support authentication with token?

Q: What languages are supported by KWS?

Q: What languages are supported by STT?

Q: I am getting SPE related error after starting the Browser (e.g. SPE server crashed, Error Downloading…, unable to connect to the SPE server, unable to start the localhost…)

Q: Why does the system show high score (>90%) even for non-targets?

Q: What do LLR, LR and score mean?

Q: I can’t manage to run Phonexia Browser software. I always get an error.

Q: We prefer USB dongle but without the USB storage

Q: I am getting the error message “Your license is not for this application.”

Q: What are the requirements for SID evaluation dataset?

ABOUT PHONEXIA

LEGAL

ACCOUNT

Using Phonexia Browser with embed SPE

Using SPE as service/daemon

ABOUT PHONEXIA

LEGAL

ACCOUNT

TAGS