Please check SPE subdirectory ./settings for configuration files.
- If only phxspe.browser.properties exists, then your Browser uses SPE as embedded component and set inside the file this directive:
server.enable_authentication_token = false
In that case you can still use SPE with Basic HTTP authentication, as described in documentation, section “Basic authentication“
- If you would like to play with “pure” daemon installation, then phxspe.properties file should exist in ./settings subdirectory. File phxspe.properties is created by
phxadminutility or can be created from ./data/phxspe.properties.default template file.
- Copy template file to ./settings directory
- Rename it to phxspe.properties
- Check for
server.enable_authentication_tokendirective and setup it as needed.
Basic installation steps are described in ./doc/INSTALL.html document.
The supported audio format are:
- WAVE (*.wav) container including any of:
- unsigned 8-bit PCM (u8)
- unsigned 16-bit PCM (u16le)
- IEEE float 32-bit (f32le)
- A-law (alaw)
- µ-law (mulaw)
- FLAC codec inside FLAC (*.flac) container
- OPUS codec inside OGG (*.opus) container
Other audio formats must be converted using external tools. SPE server can be configured to support automated conversion on background, see corresponding SPE configuration settings.
Great tools for converting other than supported formats to supported are FFmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for MS Windows, Linux and Apple OS X. Example of usage:
ffmpeg -i <source_audio_file_name> <output_audio_base_name>.wav
It causes that any supported format/codec audio file will be converted to normalised WAV audio format in 16-bit PCM little-endian as it is the default system. For more parameters please check manual pages.
sox <source_audio_file_name> -b 16 <output_audio_base_name>.wav
Number of bits defined by
-b parameter must be specified.
Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: Supported audio formats
But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is
ffmpeg utility, powerful and well documented. Please find your distribution package at http://ffmpeg.org
Then continue as described below:
Using Phonexia Browser with embed SPE
Open the Browser configuration dialog by click on button “Settings” located in tool ribbon. Select tab “Speech Engine” and configure SPE as described in documentation. Don’t forget select checkbox “Enable audio converter”.
Using SPE as service/daemon
settings\phxspe.properties using standard text editor. Then change the following line in “phxspe.properties” to enable background conversion:
audio_converter.enabled = false # change it to 'true'
Please check if the conversion tools configured below this line in phxspe.properties are configured properly. Here is an example of configuration for ffmpg:
# Set converter command # %1 is for input file # %2 is for output file ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2
Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the Phonexia SPE or using it in the Phonexia Browser.
It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 30+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 10+ including English, French, German, Russian or American Spanish.
- Open terminal in folder where PhxBrowser.exe is located (hold Shift and click right mouse button in free space in windows explorer and select “open command window here”)
- Run PhxBrowser software with command:
PhxBrowser.exe /spe-debug /spe-output
- PhxBrowser software will start with “SPE output” tab which shows the debug output of SPE
- Run PhxBrowser software in terminal with command:
./PhxBrowser --spe-debug --spe-output
- PhxBrowser software will start with ” SPE output” tab which shows debug output of SPE
A: Threshold for score isn’t set up correctly. Adjust speaker score sharpness value to calibrate the recalculation. Please see Calibration in technology documentation.
A: These abbreviations mean the following:
- LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf).
- LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf).
- Percentage (normalised) score – commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too high (typically for some non-adapted installations). Interval <0;100> (or sometimes <0;1>), in %. The higher the score, the better the match.
I always get the same error messages:
- unable to connect to the SPE
- unable to start the localhost: giving up and kill the localhost.
A: It might be because the initialization of SPE engine is too long. Phonexia Browser software treats it as initialization failure and kills the server. You can proceed as follows:
- Increase timeout in Settings > Speech Engine tab > First connection timeout
- Use fewer instances of technologies
- Use smaller models of technologies
A: We don’t provide USB without memory storage, possible solutions are:
- establish security directives related to work with the USB dongle (persons allowed to, in/out memory scan check),
- use HW based licensing,
- use license server.
A: Check your license file (license.dat) if it contains correct modules.