STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion.

Time slots and word alternatives:

The recommended algorithm for converting Confusion Network (CN) to One-best is as follows:

loop through all CN timeslots from start to end
- in each timeslot, get the input alternative with highest score and if it’s not <null/> or _DELETE_
  - add the input alternative at the end of your output
then, loop through all alternatives in your output
- for each alternative, amend its end time to match start time of following alternative

Alternatively, the second step can be done right away when building the result:

loop through all CN timeslots from start to end
- in each timeslot, get the input alternative with highest score and if it’s not <null/> or _DELETE_
  - set the end time of last alternative in your output to start time of the input alternative
  - add the input alternative at the end of your output

Example:

STT: Language Model Customization tutorial

LID: Terminology and adaptation

STT: How to properly convert Confusion Network results to One-best

Previous Article

Next Article

ABOUT PHONEXIA

LEGAL

ACCOUNT

Previous Article

Next Article

Related Articles

ABOUT PHONEXIA

LEGAL

ACCOUNT

TAGS