Skip to content Skip to main navigation Skip to footer

STT: What is Words-To-Numbers feature and how to use it

Speech Engine 3.30 and later includes new STT feature for native numbers and dates in n‍-best output.
This article explains details of the feature and gives some tips for fine-tuning the results.

NOTE: The feature is currently implemented for Czech and Slovak language only!
If you would like to help adding support for other languages (available in 5th or newer generation), please contact your Phonexia sales representative.

What is the words-to-numbers feature

Words-to-numbers feature allows to convert raw transcription of numbers, dates (or similar patterns like credit card numbers) to their native form:

two thousand twenty one 2021
fifteen hundred eighty six point zero three 1586.03
sixty four million seven hundred thousand ninety 64700090

This should help to simplify processing of the transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, e.g. in voicebot applications.

Where is the converted output available?

The words to numbers conversion is available only in n-best output (i.e. where the entire sentence variants are provided), for both file- and stream transcription.

The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output:

two… 2
two thousand… 2  2000
two thousand twenty… 2000  2020
two thousand twenty one 2020   2021

And that would require to retroactively change text which was already outputted earlier, which is impossible. Alternatively, the output would have to be somehow delayed… which is undesirable in realtime stream processing, of course.

So, the best compromise is to keep the word-level outputs untouched and do the conversion only on the segment/sentence level.

How does it work? Can it be extended or tuned?

The words to numbers conversion is based on set of grammar rules, describing how the conversion should work.

Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example, in the Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm.

You can edit the numeric.pegjs file to tune or extend the conversion functionality.
⚠WARNING: Create a backup copy of numeric.pegjs before editing the file! Making incorrect changes can have unpredictable effects and eventually make STT stop working.

Rules are described using PEG.js syntax, which is a JavaScript-like modification of Parsing Expression Grammar (PEG). Details about the syntax can be found at PEG.js website at https://pegjs.org/documentation#grammar-syntax-and-semantics.

Here is a short example of the syntax, describing conversion of standard and ordinal digits 1 to 9, taken from the Czech definition file:

...

DIGITS
  = ('jedna' / 'jeden') { return 1 }
  / ('dva' / 'dvě' / 'dvou') { return 2 }
  / ('tři' / 'tří') { return 3 }
  / ('č' / 'š') 'ty' ('ry' / 'ři' / 'ř') { return 4 }
  / 'pět' 'i'? { return 5 }
  / 'šest' 'i'? { return 6 }
  / 'sed' ('mi' / 'm' / 'um') { return 7 }
  / 'os' ('mi' / 'm' / 'um') { return 8 }
  / ('devíti' / 'devět') { return 9 }
DIGITS_ORDINAL
  = 'první' ('ho' / 'mu')? { return 1 }
  / 'druh' DIGITS_ORDINAL_SUFFIX { return 2 }
  / 'třetí' ('ho' / 'mu')? { return 3 }
  / ('č' / 'š') 'tvrt' DIGITS_ORDINAL_SUFFIX { return 4 }
  / 'pát' DIGITS_ORDINAL_SUFFIX { return 5 }
  / 'šest' DIGITS_ORDINAL_SUFFIX { return 6 }
  / 'sedm' DIGITS_ORDINAL_SUFFIX { return 7 }
  / 'osm' DIGITS_ORDINAL_SUFFIX { return 8 }
  / 'devát' DIGITS_ORDINAL_SUFFIX { return 9 }
DIGITS_ORDINAL_SUFFIX = 'ého' / 'ýho' / 'ému' / 'ýmu' / 'ou' / 'ý' / 'á' / 'é'

...