STT: Adding words to language model on the fly

Adding words to STT language model on-the-fly is possible in SPE 3.45 or newer as part of preferred phrases feature.

The POST /technologies/stt or POST /technologies/stt/input_stream API calls actually serve two purposes:

specify the actual preferred phrases (in the phrases part)
specify words to be added to STT language model (in the dictionary part)

Each part can be used independently, i.e. you can specify only preferred phrases, or only add words to dictionary, or use both features at the same time.

Example of input for starting transcription, specifying two preferred phrases and two words to be added (one with explicitly specified pronunciation):

{
  "preferred_phrases": {
    "phrases": [
      {
        "phrase": "this is preferred phrase"
      },
      {
        "phrase": "some other phrase"
      },
      ...
    ],
    "dictionary": [
      {
        "word": "preferred"
      },
      {
        "word": "phrase",
        "pronunciations": [
          {
            "phonemes": "f r ey z",
          },
          ...
        ]
      }
      ...
    ]
  }
}

Words and pronunciations

Words to be added to language model can be specified without pronunciation. In such case the system will generate default pronunciation in the background, based on the word letters and following internal linguistic rules for the given STT language.
Still, the automatically generated pronunciation may not be in line with expectations, especially for foreign words (due to pronunciation differences between the word’s native language and the STT language). Therefore it is recommended to define the pronunciations explicitly, to help prevent mistranscriptions caused by incorrect generated default pronunciations. It is also possible to define multiple pronunciations – this can be especially useful for uncommon or foreign words, slang words, etc. which people tend to mispronounce.

Allowed characters

In general, words should use using only letters (graphemes) allowed in the given STT language (use GET /technologies/stt/graphemes to get allowed graphemes list).
However, it is actually allowed to use any letters, even from different alphabet (e.g. German word like “grüßen” in Czech transcription) or different writing script (like Cyrillic or Japanese Kana). In that case, the word pronunciation MUST be explicitly specified. The pronunciation must use only phonemes supported by the STT language (use GET /technologies/stt/phonemes to get allowed phonemes list).
Specifying a word using disallowed characters without also specifying pronunciation causes that word being ignored during transcription (see the warning_message parameter below).

Transcription result

If preferred phrases and/or words were specified when starting the transcription, the result contains the same phrases and dictionary structures which were used as input for the transcription task.

The dictionary structure is enriched with

pronunciations part, generated automatically for words which did not specify pronunciations in the input
out_of_vocabulary parameter, indicating whether the word exists in the internal vocabulary or not
class parameter, containing name of eventual word class to which the word belongs
warning_message parameter, containing eventual warning message (if the warning message is present, the corresponding word/pronunciation was ignored and not used during transcription)

The example below shows transcription result if the transcription was started using the input example shown above. The added parts are highlighted.

{
  "result": {
    "version": 5,
    "name": "SpeechRecognitionResult",
    "file": "/test.wav",
    "model": "EN_US_6",
.
.
.
    "phrases": [
      {
        "phrase": "this is preferred phrase"
      },
      {
        "phrase": "and some other phrase"
      }
    ],
    "dictionary": [
      {
        "word": "preferred",
        "pronunciations": [
          {
            "phonemes": "p r ih f er d",
            "out_of_vocabulary": false,
            "class": "",
            "warning_message": ""
          }
        ]
      },
      {
        "word": "phrase",
        "pronunciations": [
          {
            "phonemes": "f r ey z",
            "out_of_vocabulary": false,
            "class": "",
            "warning_message": ""
          }
        ]
      }
    ]
  }
}

Arabic dialects in Phonexia LID and STT

STT: Adding words to language model on the fly

Words and pronunciations

Allowed characters

Transcription result

Previous Article

ABOUT PHONEXIA

LEGAL

ACCOUNT

Words and pronunciations

Allowed characters

Transcription result

Previous Article

Related Articles

ABOUT PHONEXIA

LEGAL

ACCOUNT

TAGS