Speaker Diarization labels segments of the same voice(s) in one mono channel audio record based by the individual speaker´s voice. It is a language-, domain- and channel-independent technology. It performs not only the segmentation of speakers, but of technical signals and silence as well. The outputs of the technology can be both log file with labels and/or split audio files/one new multichannel audio file. The correct speaker diarization is still research task nowadays.
Typical use cases:
- Preprocessing for other speech recognition technologies,
- labeling the parts of the utterance according to the speakers,
- splitting telephone conversation recorded in mono into several channels,
- identifying how many speakers are speaking in the recording.
The speed of Speaker Diarization is up to 50 ftRT per one instance (depending on the technology model).