Understand SPE technologies configuration file
This article explains the purpose and structure of SPE technologies configuration file technologies.xml
, or technologies.json
created by Phonexia Browser.
SPE installation includes usually multiple speech technologies (e.g. Speaker Identification, Speech To Text, etc.) in various technological models (e.g. L4, XL4, etc.), or supporting various languages (e.g. 6th generation of EN_US, CS_CZ, etc.) available. You can select from these technologies/models those to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc.
To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin
or phxadmin2
. The resulting configuration is then stored in technologies.xml
configuration file, placed in SPE settings
directory. SPE reads this configuration file during startup and initializes the technologies instances according to the information in the file.
The file has very simple structure and can be also created or modified using any plaintext edtor, or programmatically.
Example
The below example shows technologies.xml
file containing the following setup:
STT
(Speech To Text) with8
instances ofSK_SK_5
model
STT_STREAM
(Speech To Text for stream processing) with2
instances ofCS_CZ_6
model
SID4E
(Speaker Identification 4 Voiceprint Extractor) with2
instances ofL4
model3
instances ofXL4
model
SID4C
(Speaker Identification 4 Voiceprint Comparator) with2
instances ofL4
model3
instances ofXL4
model
<?xml version="1.0"?> <technology_subsystem_settings> <technologies> <item> <name>STT</name> <models> <item> <name>SK_SK_5</name> <n_instances>8</n_instances> <config_file /> </item> </models> </item> <item> <name>STT_STREAM</name> <models> <item> <name>CS_CZ_6</name> <n_instances>2</n_instances> <config_file /> </item> </models> </item> <item> <name>SID4E</name> <models> <item> <name>L4</name> <n_instances>2</n_instances> <config_file /> </item> <item> <name>XL4</name> <n_instances>3</n_instances> <config_file /> </item> </models> </item> <item> <name>SID4C</name> <models> <item> <name>L4</name> <n_instances>2</n_instances> <config_file /> </item> <item> <name>XL4</name> <n_instances>3</n_instances> <config_file /> </item> </models> </item> </technologies> </technology_subsystem_settings>
The meaning of individual elements should be pretty self-explanatory.
Probably the only element which deserves more info is the config_file
element – this one should be basically kept empty ;-)… but allows to specify a name of *.bs
BSAPI configuration file to be used by the technology initializer instead of the default file belonging to the technology and model.
However, this feature should be used only in special cases, e.g. if suggested by Phonexia experts. SPE users should normally not fiddle around with BSAPI configuration files… and if some technology config customization is needed, the user configuration file is the right method.
Technology names supported in technologies configuration file:
AGE Age Estimation DENOISER Denoiser DIAR Diarization GID Gender Identification KWS Keyword Spotting KWS_STREAM Keyword Spotting Stream LIDC Language Identification Languageprint Comparator LIDE Language Identification Languageprint Extractor PHNREC Phoneme Recognition SID4C Speaker Identification 4 Voiceprint Comparator SID4C_STREAM Speaker Identification 4 Voiceprint Stream Comparator SID4CALIB Speaker Identification 4 VoicePrint Calibration SID4E Speaker Identification 4 Voiceprint Extractor SID4E_STREAM Speaker Identification 4 Voiceprint Stream Extractor SQE Speech Quality Estimation SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker Identification Voiceprint Extractor (legacy) SIDE_STREAM Speaker Identification Voiceprint Stream Extractor (legacy) DICTATE Dictate (valid only in SPE 3.17 and older)
JSON-formatted file created by Phonexia Browser
If SPE technologies are configured from Phonexia Browser – which is possible only if SPE is used in the special “embedded SPE” (or “SPE on localhost”) mode from Phonexia Browser – the technologies configuration is stored in JSON-formatted technologies.json
in the SPE settings
directory. This is to separate the Browser-made configuration for this special SPE mode from the normal SPE technologies configuration. Therefore, the configuration inside the XML and JSON file can differ.
Example
The below example shows technologies.json
file containing (almost) the same setup as in the XML file example above. There are two differences against the XML example:
- the
STT_STREAM
technology is missing – Phonexia Browser does not support stream processing, i.e. does not allow configuration of stream technologies - the
config_file
setting is also missing – Phonexia Browser does not support this special expert-level feature, i.e. does not store the setting
{ "technology_subsystem_settings": { "technologies": [ { "name": "STT", "models": [ { "name": "SK_SK_5", "n_instances": 8 } ] }, { "name": "SID4E", "models": [ { "name": "L4", "n_instances": 2 }, { "name": "XL4", "n_instances": 3 } ] }, { "name": "SID4C", "models": [ { "name": "L4", "n_instances": 2 }, { "name": "XL4", "n_instances": 3 } ] } ] } }