Skip to content Skip to main navigation Skip to footer

Adding words to STT language model on the fly

Adding words to STT language model on-the-fly is possible as part of Preferred Phrases V2 feature, available in SPE 3.45 or newer.

The POST /technologies/stt or POST /technologies/stt/input_stream REST calls actually serve two purposes:

  • specify the actual preferred phrases (in the phrases part)
  • specify words to be added to STT language model (in the dictionary part)

Example of input for starting transcription, specifying two preferred phrases and two words to be added (one with explicitly specified pronunciation):

{
  "preferred_phrases": {
    "phrases": [
      {
        "phrase": "this is preferred phrase"
      },
      {
        "phrase": "some other phrase"
      },
      ...
    ],
    "dictionary": [
      {
        "word": "preferred"
      },
      {
        "word": "phrase",
        "pronunciations": [
          {
            "phonemes": "f r ey z",
          },
          ...
        ]
      }
      ...
    ]
  }
}

Each part can be used independently, i.e. you can specify only preferred phrases, or only add words to dictionary, or use both features at the same time.

Words and pronunciations

Words to be added to language model can be specified without pronunciation, in which case the system will generate default pronunciation in the background, based on internal linguistic rules for the given STT language.
Still, the generated pronunciation may not be in line with expectations… therefore it is recommended to define the pronunciations explicitly. This helps to prevent mistranscriptions caused by incorrect default pronunciations. It is also possible to define multiple pronunciations – this can be especially useful for uncommon or foreign words, slang words, etc.

Allowed characters

Word, specified without pronunciation, must be specified using only characters (graphemes) allowed in the given STT language.
However, word can be specified using any characters, if its pronunciation is also explicitly specified. Specifying a word using disallowed characters without also specifying pronunciation causes that word being ignored during transcription (see the warning_message parameter below).

Pronunciations must be always specified only using phonemes allowed in the given STT language.

Transcription result

If preferred phrases and/or words were specified when starting the transcription, the result contains the same phrases and dictionary structures which were used as input for the transcription task.

The dictionary structure is enriched with

  • pronunciations part, generated automatically for words which did not specify pronunciations in the input
  • out_of_vocabulary parameter, indicating whether the word exists in the internal vocabulary or not
  • warning_message parameter, containing eventual warning message (if the warning message is present, the corresponding word/pronunciation was ignored and not used during transcription)

The example below shows transcription result if the transcription was started using the input example shown above:

{
  "result": {
    "version": 5,
    "name": "SpeechRecognitionResult",
    "file": "/test.wav",
    "model": "EN_US_6",
.
.
.
    "phrases": [
      {
        "phrase": "this is preferred phrase"
      },
      {
        "phrase": "and some other phrase"
      }
    ],
    "dictionary": [
      {
        "word": "preferred",
        "pronunciations": [
          {
            "phonemes": "p r ih f er d",
            "out_of_vocabulary": false,
            "warning_message": ""
          }
        ]
      },
      {
        "word": "phrase",
        "pronunciations": [
          {
            "phonemes": "f r ey z",
            "out_of_vocabulary": false,
            "warning_message": ""
          }
        ]
      }
    ]
  }
}