Search Results for: adaptation

Results 1 - 5 of 5 Page 1 of 1
Results per-page: 10 | 20 | 50 | 100

Q: Please give me a recommendation for LID adaptation set.

Relevance: 100%      Posted on: 2017-06-27

A: The following is recommended: For adding new language to language pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)

Speech to Text (STT)

Relevance: 10%      Posted on: 2017-05-18

About STT Phonexia Speech Transcription (STT) converts speech in audio signals into plain text. Technology works with both acoustics as well as dictionary of words, acoustic model and pronunciation. This makes it dependent on language and dictionary – only some set of words can be transcribed. As an input, audio file or stream is needed, together with selection of language model to be used for transcription. As an output the transcription in one of the formats is provided. The technology extract features out of voice, using acoustic and language models together with pronunciation all in recognition network creates a hypothesis of transcribed words and „decode“ the most possible transcription. Based on requested output types one or more transcribed text are returned with score and time frame. Application areas: Maintain high reaction times by routing calls with specific content/topic to human operators Search for specific information in large call archives Data-mine audio content and index for search Advanced topic/content analysis provides additional value.   Technology overview Trained with emphasis on spontaneous telephony conversation Based on state-of-the-art techniques for acoustic modeling, including discriminative training and neural network-based features Output One-best transcription - i.e. a file with a…


Relevance: 10%      Posted on: 2017-06-15

Document which briefly describes processes and relations in Phonexia Technologies with consideration on correct word usage.   SID - Speaker Identification Technology (about SID technology) which recognize the speaker in the audio based on the input data (usually database of voiceprints). XL3, L3,L2,S2 - Technology models of SID. Speaker enrollment - Process, where the speaker model is created (usually new record in the voiceprint database). Speaker model: 1/ should reach recommended minimums (net speech, audio quality), 2/ should be made with more net speech and thus be more robust. The test recordings (payload) are then compared to the model (see…

Language Identification (LID)

Relevance: 10%      Posted on: 2017-06-26

About LID Phonexia uses state-of-the-art language identification (LID) technology based on iVectors that were introduced by NIST (National Institute of Standards and Technology, USA) during the 2010 evaluations. The technology is independent on any text, language, dialect, or channel. This highly accurate technology uses the power of voice biometrics to automatically recognize spoken language. Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Application areas Preselecting multilingual sources and routing…

Difference between on-the-fly and off-line type of transcription (STT)

Relevance: 10%      Posted on: 2017-12-11

Similarly as human, the ASR (STT) engine is doing the adaptation to an acoustic channel, environment and speaker. Also the ASR (STT) engine is learning more information about the content during time, that is used to improve recognition. The dictate engine, also known as on-the-fly transciption, does not look to the future and has information about just a few seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can't predict what will come in next seconds of the speech. When access to the whole recording is granted during off-line transcription…