STT: What is Words-To-Numbers feature and how to use it
Speech Engine 3.30 and later includes new STT feature for native numbers and dates in n-best output.
This article explains details of the feature and gives some tips for fine-tuning the results.
NOTE: The feature is currently implemented for Czech and Slovak language only!
If you would like to help adding support for other languages (available in 5th or newer generation), please contact your Phonexia sales representative.
What is the words-to-numbers feature
Words-to-numbers feature allows to convert raw transcription of numbers, dates (or similar patterns like credit card numbers) to their native form:
two thousand twenty one | ⇒ | 2021 |
fifteen hundred eighty six point zero three | ⇒ | 1586.03 |
sixty four million seven hundred thousand ninety | ⇒ | 64700090 |
This should help to simplify processing of the transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, e.g. in voicebot applications.
Where is the converted output available?
The words to numbers conversion is available only in n-best output (i.e. where the entire sentence variants are provided), for both file- and stream transcription.
The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output:
two… | 2 |
two thousand… | |
two thousand twenty… | |
two thousand twenty one |
And that would require to retroactively change text which was already outputted earlier, which is impossible. Alternatively, the output would have to be somehow delayed… which is undesirable in realtime stream processing, of course.
So, the best compromise is to keep the word-level outputs untouched and do the conversion only on the segment/sentence level.
How does it work? Can it be extended or tuned?
The words to numbers conversion is based on set of grammar rules, describing how the conversion should work.
Conversion rules are stored in numeric.pegjs
file, located in grm
subdirectory inside the STT model directory. For example, in the Czech 6th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm
.
You can edit the numeric.pegjs
file to tune or extend the conversion functionality.
⚠WARNING: Create a backup copy of numeric.pegjs
before editing the file! Making incorrect changes can have unpredictable effects and eventually make STT stop working.
Rules are described using PEG.js syntax, which is a JavaScript-like modification of Parsing Expression Grammar (PEG). Details about the syntax can be found at PEG.js website at https://pegjs.org/documentation#grammar-syntax-and-semantics.
Here is a short example of the syntax, describing conversion of standard and ordinal digits 1 to 9, taken from the Czech definition file:
... DIGITS = ('jedna' / 'jeden') { return 1 } / ('dva' / 'dvě' / 'dvou') { return 2 } / ('tři' / 'tří') { return 3 } / ('č' / 'š') 'ty' ('ry' / 'ři' / 'ř') { return 4 } / 'pět' 'i'? { return 5 } / 'šest' 'i'? { return 6 } / 'sed' ('mi' / 'm' / 'um') { return 7 } / 'os' ('mi' / 'm' / 'um') { return 8 } / ('devíti' / 'devět') { return 9 } DIGITS_ORDINAL = 'první' ('ho' / 'mu')? { return 1 } / 'druh' DIGITS_ORDINAL_SUFFIX { return 2 } / 'třetí' ('ho' / 'mu')? { return 3 } / ('č' / 'š') 'tvrt' DIGITS_ORDINAL_SUFFIX { return 4 } / 'pát' DIGITS_ORDINAL_SUFFIX { return 5 } / 'šest' DIGITS_ORDINAL_SUFFIX { return 6 } / 'sedm' DIGITS_ORDINAL_SUFFIX { return 7 } / 'osm' DIGITS_ORDINAL_SUFFIX { return 8 } / 'devát' DIGITS_ORDINAL_SUFFIX { return 9 } DIGITS_ORDINAL_SUFFIX = 'ého' / 'ýho' / 'ému' / 'ýmu' / 'ou' / 'ý' / 'á' / 'é' ...