Skip to contentSkip to main navigation Skip to footer

How to convert STT confusion network results to one-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion.

Time slots and word alternatives:

The recommended algorithm for converting Confusion Network (CN) to One-best is as follows:

  1. loop through all CN timeslots from start to end
    • in each timeslot, get the input alternative with highest score and if it’s not <null/> or _DELETE_
      • add the input alternative at the end of your output
  2. then, loop through all alternatives in your output
    • for each alternative, amend its end time to match start time of following alternative

 

Alternatively, the second step can be done right away when building the result:

  1. loop through all CN timeslots from start to end
    • in each timeslot, get the input alternative with highest score and if it’s not <null/> or _DELETE_
      • set the end time of last alternative in your output to start time of the input alternative
      • add the input alternative at the end of your output

Example: