Language Identification (LID)

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis.

Application areas

Preselecting multilingual sources and routing audio files to language-dependent technologies (transcribing, indexing, etc.)
Analyzing network traffic media (language statistics)
Routing particular calls (languages) to human operators (language experts)

Scoring and results

The LID language pack defines a set of recognizable languages (represented by a language models).
When identifying the language in audio recording (or languageprint), LID does the following:

creates languageprint of the recording (if the input is audio recording)
compares that languageprint with each language model in a language pack
- and calculates probability that these two languages are the same

For explanation of the terms languageprint, language model and language pack, refer to the LID: Terminology and adaptation article.

The final scores are returned as logarithms of these individual probabilities – i.e. as values from {-inf,0} interval – for each language in the language pack.
(to convert raw LID score to percentage, use e ^score * 100 formula)

LID adaptation (custom language packs)

The scoring principle described above implies that score is distributed among all languages in a language pack.
It means that every language has to score with non-zero value… i.e. that the scores may get diluted as they get spread among many languages.
Additionally, if the language pack contains too many non-equally trained languages (i.e. using very different amount of source audio), the entire system could be influenced and generate low scores even for matching languages.

Therefore it is a good idea to create language pack containing only limited number of languages, e.g. by excluding some really exotic ones, or by keeping only those few languages actually expected in your use case.

This process of tailoring the language pack for particular needs is called language pack adaptation and is described in LID: Terminology and adaptation article.

Example usages of custom language packs

Law enforcement agency monitoring a network of criminals using only a particular set of languages can use the approach of keeping only languages expected to appear in the traffic.
This can reduce the number of scored languages to like 3 or 5 languages only.
Multilingual call center serving European market can use the approach of excluding languages which surely won’t appear in their traffic – like African ones (Afan, Hausa, …), Asian ones (Chinese, Japanese, …), etc. – while still keeping languages which are less likely, but still possible to appear.
This can reduce the number of scored languages from ~80 languages (included in the default out-of-the-box language pack) to like 20 or even less languages.

In both cases, limiting the number of languages in a language pack results in the scores being distributed among less languages, i.e. the score values getting higher with clearer distinction between languages and clearer gap between best-scoring language and the other ones.

Here is an example of a Turkish phone call identification

You may notice a much sharper score when using a Language pack with only relevant languages (77.3% vs 93,3%):

Using default language pack with 60+ languages			Using limited language pack with 20 European languages
Language	Raw score	Percentage	Language	Raw score	Percentage
Turkish	-0.258	77.270	Turkish	-0.069	93.326
Uzbek	-2.436	8.753	Albanian	-4.347	1.294
Azerbaijani	-3.027	4.845	Hungarian	-4.657	0.949
Dari	-4.432	1.190	Ukrainian	-5.037	0.649
Albanian	-5.139	0.586	Swedish	-5.088	0.617
Tibetan	-5.270	0.515	French	-5.168	0.570
Georgian	-5.277	0.511	English_British	-5.316	0.491
Swedish	-5.384	0.459	Macedonian	-5.443	0.433
Farsi	-5.737	0.323	Greek	-5.698	0.335
Hungarian	-5.777	0.310	Serbian	-6.002	0.247
…			…

Speaker Diarization (DIAR)

Gender Identification (GID)

Language Identification (LID)

Scoring and results

LID adaptation (custom language packs)

Example usages of custom language packs

Previous Article

Next Article

ABOUT PHONEXIA

LEGAL

ACCOUNT

Scoring and results

LID adaptation (custom language packs)

Example usages of custom language packs

Previous Article

Next Article

Related Articles

ABOUT PHONEXIA

LEGAL

ACCOUNT

TAGS