Skip to content Skip to main navigation Skip to footer

Understand SPE technologies configuration file

This article explains the purpose and structure of SPE technologies configuration file technologies.xml, or technologies.json created by Phonexia Browser.

SPE installation includes usually multiple speech technologies (e.g. Speaker Identification, Speech To Text, etc.) in various technological models (e.g. L4, XL4, etc.), or supporting various languages (e.g. 6th generation of EN_US, CS_CZ, etc.) available. You can select from these technologies/models those to be enabled in your SPE installation – typically, you may want to test various models during initial testing, to see how they perform on your audio… or, you may want to enable additional technologies during development of your application, etc.

To select technologies/models to be enabled in in your SPE, you can use one of SPE administration tools, phxadmin or phxadmin2. The resulting configuration is then stored in technologies.xml configuration file, placed in SPE settings directory. SPE reads this configuration file during startup and initializes the technologies instances according to the information in the file. 

The file has very simple structure and can be also created or modified using any plaintext edtor, or programmatically.

Example

The below example shows technologies.xml file containing the following setup:

  • STT (Speech To Text) with
    • 8 instances of SK_SK_5 model
  • STT_STREAM (Speech To Text for stream processing) with
    • 2  instances of CS_CZ_6 model
  • SID4E (Speaker Identification 4 Voiceprint Extractor) with
    • 2 instances of L4 model
    • 3 instances of XL4 model
  • SID4C (Speaker Identification 4 Voiceprint Comparator) with
    • 2 instances of L4 model
    • 3 instances of XL4 model
<?xml version="1.0"?>
<technology_subsystem_settings>
  <technologies>
    <item>
      <name>STT</name>
      <models>
        <item>
          <name>SK_SK_5</name>
          <n_instances>8</n_instances>
          <config_file />
        </item>
      </models>
    </item>
    <item>
      <name>STT_STREAM</name>
      <models>
        <item>
          <name>CS_CZ_6</name>
          <n_instances>2</n_instances>
          <config_file />
        </item>
      </models>
    </item>
    <item>
      <name>SID4E</name>
      <models>
        <item>
          <name>L4</name>
          <n_instances>2</n_instances>
          <config_file />
        </item>
        <item>
          <name>XL4</name>
          <n_instances>3</n_instances>
          <config_file />
        </item>
      </models>
    </item>
    <item>
      <name>SID4C</name>
      <models>
        <item>
          <name>L4</name>
          <n_instances>2</n_instances>
          <config_file />
        </item>
        <item>
          <name>XL4</name>
          <n_instances>3</n_instances>
          <config_file />
        </item>
      </models>
    </item>
  </technologies>
</technology_subsystem_settings>

The meaning of individual elements should be pretty self-explanatory.

Probably the only element which deserves more info is the config_file element – this one should be basically kept empty ;-)… but allows to specify a name of *.bs BSAPI configuration file to be used by the technology initializer instead of the default file belonging to the technology and model.
However, this feature should be used only in special cases, e.g. if suggested by Phonexia experts. SPE users should normally not fiddle around with BSAPI configuration files… and if some technology config customization is needed, the user configuration file is the right method.

Technology names supported in technologies configuration file:

AGE                 Age Estimation
DENOISER            Denoiser
DIAR                Diarization
GID                 Gender Identification
KWS                 Keyword Spotting
KWS_STREAM          Keyword Spotting Stream
LIDC                Language Identification Languageprint Comparator
LIDE                Language Identification Languageprint Extractor
PHNREC              Phoneme Recognition
SID4C               Speaker Identification 4 Voiceprint Comparator
SID4C_STREAM        Speaker Identification 4 Voiceprint Stream Comparator
SID4CALIB           Speaker Identification 4 VoicePrint Calibration
SID4E               Speaker Identification 4 Voiceprint Extractor
SID4E_STREAM        Speaker Identification 4 Voiceprint Stream Extractor
SQE                 Speech Quality Estimation
SQE_STREAM          Speech Quality Estimation Stream
STT                 Speech To Text
STT_STREAM          Speech To Text Stream
TAE                 Time Analysis Extraction
TAE_STREAM          Time Analysis Extraction Stream
VAD                 Voice Activity Detection
VAD_STREAM          Voice Activity Detection Stream

SIDC                Speaker Identification Voiceprint Comparator          (legacy)
SIDC_STREAM         Speaker Identification Voiceprint Stream Comparator   (legacy)
SIDCALIBSET         Speaker Identification VoicePrint Calibration         (legacy)
SIDCALIBSET_STREAM  Speaker Identification VoicePrint Stream Calibration  (legacy)
SIDE                Speaker Identification Voiceprint Extractor           (legacy)
SIDE_STREAM         Speaker Identification Voiceprint Stream Extractor    (legacy)

DICTATE             Dictate                     (valid only in SPE 3.17 and older)

 


JSON-formatted file created by Phonexia Browser

If SPE technologies are configured from Phonexia Browser – which is possible only if SPE is used in the special “embedded SPE” (or “SPE on localhost”) mode from Phonexia Browser – the technologies configuration is stored in JSON-formatted technologies.json in the SPE settings directory. This is to separate the Browser-made configuration for this special SPE mode from the normal SPE technologies configuration. Therefore, the configuration inside the XML and JSON file can differ.

Example

The below example shows technologies.json file containing (almost) the same setup as in the XML file example above. There are two differences against the XML example:

  • the STT_STREAM technology is missing – Phonexia Browser does not support stream processing, i.e. does not allow configuration of stream technologies
  • the config_file setting is also missing – Phonexia Browser does not support this special expert-level feature, i.e. does not store the setting
{
  "technology_subsystem_settings": {
    "technologies": [
      {
        "name": "STT",
        "models": [
          {
            "name": "SK_SK_5",
            "n_instances": 8
          }
        ]
      },
      {
        "name": "SID4E",
        "models": [
          {
            "name": "L4",
            "n_instances": 2
          },
          {
            "name": "XL4",
            "n_instances": 3
          }
        ]
      },
      {
        "name": "SID4C",
        "models": [
          {
            "name": "L4",
            "n_instances": 2
          },
          {
            "name": "XL4",
            "n_instances": 3
          }
        ]
      }
    ]
  }
}

 

Related Articles