A: Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition. The dictate engine, also known as on-the-fly transcription, does not look to the future and has information about just a few seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can’t predict what will come in next seconds of the speech.
When access to the whole recording is granted during off-line transcription, speech engine can correct result before it is printed out by taking into account also the subsequent segments. The beginning of the recording can then be recognized with high accuracy too.