Skip to contentSkip to main navigation Skip to footer

STT Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model.

Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents.
Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform the system about these words.

The basic principle of the LMC tool is that it takes an existing STT model as a source and creates new STT model with your customizations included as a target.
To see results of the customizations, you need to use the new STT model for the transcription.

Currently supported language model customizations are:

  • adding new words and/or pronunciations
    This is intended for adding client-, domain- or product specific words like company names, product names, component names, etc.

Note: LMC works only with 5th or newer generation STT models.

LMC is provided as command line tool and is available from Phonexia either as part of Speech To Text package for command line, or as a separate download.


Customizing STT language model

1) Creating word list

Word list is UTF-8 encoded text file, containing list if words to be added to the STT language model, one word per line.
Note: LMC v3.30.0 (March 2020) or older requires the text file without Byte-Order-Mark (BOM)

Each word can be optionally followed by its pronunciation, separated from the word by SPACE or TAB character. Pronunciations must use only phonemes allowed by the corresponding language – see Phonemes_for_STT_and_KWS (or Annex2 in older versions) PDF file.

If pronunciation is not explicitly specified, a default one generated internally will be used. To add multiple pronunciation variants for the same word, enter multiple word–pronunciation pairs, each on a separate line.

An example of English word list:
  • the words iPhone and contract don’t have any specific pronunciations defined
  • the word schneider has a specific pronunciation defined
  • the abbreviation MIT has two alternative pronunciations defined
iPhone
contract
schneider sh n ay d er
MIT eh m ay t iy
MIT m ih t

2) Creating customized STT model using LMC tool

Basic philosophy of the LMC tool is that it takes an existing model and creates its copy, with added customizations. The customized copy is marked by a name suffix, to differentiate it from the source.
The used word list file is “backed up” to the target directory where the customized copy is created.

NOTE:
The customized model can NOT be used as a source for subsequent customization (i.e. no cascading customizations are possible).
To “cumulate customizations” it’s necessary to create customized model using “cumulative word list” – that’s where the word list backup copied to the target model directory gets handy.

Basic LMC usage is

lmc -config <configuration_file> -add-words <wordlist_file> -model-suffix <model_name_suffix> -out-model-dir <directory_to_place_customized_output>

Where:

<configuration_file> is the *.bs config file belonging to the existing model to be customized
<wordlist_file> is the word list file created in previous step
<model_name_suffix> is a text which will be added as suffix to the modified model name – for example, the default Polish model name is pl_pl_5, so specifying custom suffix will result in the customized model being named pl_pl_5_custom.
<directory_to_place_customized_output> is the output directory where the resulting customized model will be placed (together with a copy of the word list file, as a backup)


Using customized STT model in Speech Engine STT

To use customized STT model in Speech Engine STT, it’s necessary to

  • place the customized model in correct location, so that Speech Engine can find it
  • register and enable the customized model in Speech Engine using phxadmin

1) Placing the customized STT model in correct location

In order to be recognized by Speech Engine, the customized STT model must be placed in a correct location. The location is <SPE_directory>/bsapi/stt – the data and settings directories of the customized STT model should go here.

So either copy the customized STT model there manually, or let LMC to place its output directly there:

lmc -config ... ... ... -out-model-dir <SPE_directory>/bsapi/stt

2) Registering the customized STT model in Speech Engine

First make sure that Speech Engine is not running.

Then run phxadmin with configure-tech parameter, select STT technology and enable the customized model which should be listed there.

Then launch Speech Engine.

3) Checking the customization result

You can then check that the customized STT model is listed in GET /technologies list.

To use the customized STT model, put its name in the model parameter, like this:

GET /technologies/stt?path=foobar.wav&model=<customized_model_name>

Using customized STT model in command line STT

To use customized STT model in command line STT, simply specify the new configuration file belonging to the customized STT model in the -config parameter.

For example, assuming that original pl_pl_5 model was customized, specifying updated as the model suffix, the corresponding STT command line to use the customized model would look similar to this:

stt -config settings\stt_pl_pl_5_updated.bs -in-file <input_file> -out-file <output_file> ...

Privacy Preference Center

Necessary

Required cookies required for proper function of Word Press publication platform.

gdpr*, wordpress*,cf7*,wp-settings*,PHPSESSID

Analytics

We are using Google Analytic in Global Site Tag configuration for keeping site content optimized for great user experience. No personal data are sent.

_ga*,_gid