Our technologies are prepared to run on both Windows and Linux OS.
For more details of the supported operating systems as well as recommended HW setup, see Recommended OS and HW
Formats supported directly and natively are:
Other audio formats must be converted to one of those natively supported using external tools.
SPE server can be configured do this conversion automatically in background, see Understand SPE audio converter article.
Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for Microsoft Windows, Linux and Apple OS X. Example of usage:
ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav
This command converts any supported format/codec audio file to normalized WAV audio format in 16-bit PCM little-endian as it is the default system. For more parameters please check FFmpeg manual pages.
sox <source_audio_file_name> -b 16 <output_audio_base_name>.wav
Number of bits defined by
-b parameter must be specified.
Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats?
But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is
ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org
Then continue as described below:
Open the Browser configuration dialog by click on button “Settings” located in tool ribbon. Select tab “Speech Engine” and configure SPE as described in documentation. Don’t forget select checkbox “Enable audio converter”.
settings\phxspe.properties using standard text editor. Then change the following line in “phxspe.properties” to enable background conversion:
audio_converter.enabled = false # change it to 'true'
Please check if the conversion tools configured below this line in phxspe.properties are configured properly. Here is an example of configuration for ffmpg:
# Set converter command # %1 is for input file # %2 is for output file ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2
Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser.
It depends on the technology.
Phonexia Language Identification (LID) is pre-trained for 60+ languages.
Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more.
A: Please check SPE subdirectory ./settings for configuration files.
server.enable_authentication_token = false
phxadmin utility or can be created from ./data/phxspe.properties.default template file.
server.enable_authentication_token directive and setup it as needed.
Basic installation steps are described in ./doc/INSTALL.html document.
PhxBrowser.exe /spe-debug /spe-output
./PhxBrowser --spe-debug --spe-output
A: Threshold for score isn’t set up correctly. Adjust speaker score sharpness value to calibrate the recalculation.
Please see Calibration in technology documentation.
A: These abbreviations mean the following:
I always get the same error messages:
A: This error may happen if the initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server.
You can fix this by doing the following:
A: We don’t provide USB without memory storage, possible solutions are:
For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset.
SID dataset (minimum requirements):
To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully.
The requirements are:
*Note: splitting single recording into multiple shorter recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times.
In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results.
Warning: Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it’s very important to prepare the data carefully.
See SID Evaluation for more details