STT: What is Words-To-Numbers feature and how to use it

Table of Contents

This article explains details of new STT feature for native numeric numbers and dates trancription in n‍-best output and gives some tips for fine-tuning the results.

NOTE: The feature works out-of-the-box in the following STT languages and models:

English – EN_US_6 and EN_US_A_6
Spanish – ES_6
Polish – PL_PL_6
Czech – CS_CZ_5 and CS_CZ_6
Slovak – SK_SK_5 and SK_SK_6

You can add this functionality to other languages, or tune the existing one, by yourself by adding/editing the conversion rules, see below for more details.

What is the words-to-numbers feature

Words-to-numbers feature allows to convert raw transcription of numbers, dates (or similar patterns like credit card numbers) to their native form:

two thousand twenty one	⇒	2021
fifteen hundred eighty six point zero three	⇒	1586.03
sixty four million seven hundred thousand ninety	⇒	64700090

This should help to simplify processing of the transcribed texts by text analytic layers or NLP (Natural Language Processing) engines, e.g. in voicebot applications.

Where is the converted output available?

The words to numbers conversion is available only in n-best output (i.e. where the entire sentence variants are provided), for both file- and stream transcription.

The reason for not having it available in the word-level outputs (One-best, Confusion Network) is that it would create difficulties in stream transcription – as new words keep coming, they may potentially change the previous output:

two…	2
two thousand…	2 2000
two thousand twenty…	~~2000~~ 2020
two thousand twenty one	~~2020~~ 2021

And that would require to retroactively change text which was already outputted earlier… which is impossible.
Alternatively, the output would have to be somehow delayed… which is undesirable in realtime stream processing, of course.

So, the best compromise is to keep the word-level outputs untouched and do the conversion only on the segment/sentence level.

How does it work?

The words to numbers conversion is based on set of grammar rules, describing how the conversion should work.
Conversion rules are stored in numeric.pegjs file, located in grm subdirectory inside the STT model directory. For example:

in Czech 6^th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_cs_cz_6/grm
in Spanish 6^th generation STT it’s located in {SPE_directory}/bsapi/stt/data/models_es_6/grm

Can it be extended or tuned?

You can edit the numeric.pegjs file to tune or extend the conversion functionality.
⚠ WARNING: Create a backup copy of numeric.pegjs before editing the file! Making incorrect changes can have unpredictable effects and eventually make STT stop working.

Rules are described using PEG.js syntax, which is a JavaScript-like modification of Parsing Expression Grammar (PEG). Details about the syntax can be found at PEG.js website at https://pegjs.org/documentation#grammar-syntax-and-semantics.

Here are short examples of the syntax, showing excerpts of standard and ordinal digits 1 to 9 conversion:

...
DIGITS
  = 'one' { return 1 }
  / 'two' { return 2 }
  / 'three' { return 3 }
  / 'four' & { return boundary() } { return 4 }
  / 'five' & { return boundary() } { return 5 }
  / 'six' & { return boundary() } { return 6 }
  / 'seven' & { return boundary() } { return 7 }
  / 'eight' & { return boundary() } { return 8 }
  / 'nine' & { return boundary() } { return 9 }

ZERO = 'zero' { return 0 }

.
.
.

DIGITS_ORDINAL_ST = 'first' { return 1 }
DIGITS_ORDINAL_ND = 'second' { return 2 }
DIGITS_ORDINAL_RD = 'third' { return 3 }
DIGITS_ORDINAL_TH
  = 'fourth' { return 4 }
  / 'fifth' { return 5 }
  / 'sixth' { return 6 }
  / 'seventh'  { return 7 }
  / 'eighth'  { return 8 }
  / 'ninth'  { return 9 }

ZERO_ORDINAL = 'zeroth' { return 0 }

...

...

DIGITS
  = ('jedna' / 'jeden') { return 1 }
  / ('dva' / 'dvě' / 'dvou') { return 2 }
  / ('tři' / 'tří') { return 3 }
  / ('č' / 'š') 'ty' ('ry' / 'ři' / 'ř') { return 4 }
  / 'pět' 'i'? { return 5 }
  / 'šest' 'i'? { return 6 }
  / 'sed' ('mi' / 'm' / 'um') { return 7 }
  / 'os' ('mi' / 'm' / 'um') { return 8 }
  / ('devíti' / 'devět') { return 9 }
DIGITS_ORDINAL
  = 'první' ('ho' / 'mu')? { return 1 }
  / 'druh' DIGITS_ORDINAL_SUFFIX { return 2 }
  / 'třetí' ('ho' / 'mu')? { return 3 }
  / ('č' / 'š') 'tvrt' DIGITS_ORDINAL_SUFFIX { return 4 }
  / 'pát' DIGITS_ORDINAL_SUFFIX { return 5 }
  / 'šest' DIGITS_ORDINAL_SUFFIX { return 6 }
  / 'sedm' DIGITS_ORDINAL_SUFFIX { return 7 }
  / 'osm' DIGITS_ORDINAL_SUFFIX { return 8 }
  / 'devát' DIGITS_ORDINAL_SUFFIX { return 9 }
DIGITS_ORDINAL_SUFFIX = 'ého' / 'ýho' / 'ému' / 'ýmu' / 'ou' / 'ý' / 'á' / 'é'

...

STT: Configuring word detection parameters for stream transcription

STT: Language Model Customization tutorial

STT: What is Words-To-Numbers feature and how to use it

What is the words-to-numbers feature

Where is the converted output available?

How does it work?

Can it be extended or tuned?

Previous Article

Next Article

ABOUT PHONEXIA

LEGAL

ACCOUNT

Voice Inspector Categories

What is the words-to-numbers feature

Where is the converted output available?

How does it work?

Can it be extended or tuned?

Previous Article

Next Article

Related Articles

ABOUT PHONEXIA

LEGAL

ACCOUNT

TAGS

Voice Inspector Categories