Skip to content Skip to main navigation Skip to footer

Time Analysis Extraction (TAE)

Technology description

Time Analysis Extraction (TAE) by Phonexia extracts base information from dialogue in a recording, providing essential knowledge about conversation flow.

That makes easy to identify:

  • long reaction time
  • crosstalk
  • responses of speakers in both channels
  • speed of speech measured in phonemes per second

Typical usage domain

It is typically used in contact centers for indicating weak moments in dialogue. This can be used to improve calls between operators and callers or to indicate potential stress points in phone calls, for example, change of speech speed during the conversation).


TAE can process both audio files and streams (for format details see Speech Engine documentation). By its nature, TAE is usable mainly on two channel phone calls recordings, where operator speaks on one channel and caller on another. TAE can process also mono-channel recordings, but it provides limited set of results for dialogue statistic.

When the technology is applied on a stream, the results are created and returned on every request, even during an ongoing stream.


As with the whole SPE, results are provided in form of JSON or XML file.

You can find information about monologues and conversations.


Monologue section describes the statistics of a recording related to each channel.

It answers following questions:

  • how long only this speaker was talking alone
  • how much of it was a net speech
  • what was an average speed of the speech


This section describes reactions of one channel to the other:

  • places of speaker’s longest and shortest reaction, i.e., where this speaker stopped talking and the other speaker started talking
  • the average reaction times
  • number of speaker-turns in the particular direction
  • and details about crosstalk, for example where the other speaker is talking “over” this speaker


This section is optional and need to be explicitly turned on. It describes segments of detected voice and silence (the same as Voice Activity Detection technology).

More information

You can find more information in corresponding chapter of API documentation:

Related Articles