Phonexia Phoneme Recogniser (PHNREC) converts speech signals into pronunciation characters (so called phonemes). After the conversion, the pronunciation (text) can be easily indexed and searched by third party text data mining tools. The technology is optimized for noisy recordings and colloquial speech, can process audio files as well as audio streams and can provide results in several output formats. Phoneme Recogniser is delivered as part of Keyword Spotting (KWS) technology. It can be also used without KWS technology.
Typical use cases
- „search-in-speech“ – search for specific information in large call archives (e.g., claims inspection),
- get custom based pronunciation of word or phrase as customized keyword in keyword spotting technology (better accuracy of KWS technology),
- get custom based pronunciation of word user can add to language model of speech-to-text technology (better accuracy of KWS technology).
- audio file (format details – see Speech Engine documentation); stream not supported,
- technology model name (i.e. language code) to be used for phoneme transcription.
In the process of transcribing speech-to-phonemes, the Phoneme Recogniser usually identifies individual speech segments and convert it to pronunciation.
Input: „Hi, this it Lewis.“ (WAV file containing speech)
sil hh ay dh ow s ih s l uw uw th sil (plain-text or xml/json output)
Note: The outputs can contain the following special tokens:
sil silent part (or no speech detected)
The list of phonemes is available in the document
phonemes_for_stt_and_kws.pdf (delivered as part of manuals in SPE or STT or KWS).
List of supported languages in Phoneme Recogniser is same as in Keyword Spotting.
Link to API reference