Voice Activity Detection is a language-, domain- and channel-independent technology that identifies parts of audio recordings with speech content vs. non-speech content. It creates labels for speech and other signals in the recording; this can then serve as a decision point whether to process the recording by other technologies or not. VAD is usually part of rapid filtration process in deployment.
Typical use cases are:
- detection of present or absent human speech for voice processing,
- filtering non-speech parts of the recording,
- filtering out recordings with not enough net speech to be processed by other technologies
- voice activated process, etc.
The speed of Voice Activity Detection is 140 ftRT per one instance.