Skip to contentSkip to main navigation Skip to footer

LID adaptation

This article describes various ways of Language Identification adaptation.

Basic terminology

Languageprint (*.lp file) – numeric representation of the audio, extracted from audio file for language identification purpose of (similar to “voiceprint”, but representing the spoken language, not the speaking person)

Languageprint archive (*.lpa file) – multiple languageprints combined into single archive

Creation of languageprint archives is not supported by SPE, these are supported as input only.

Language model – digital characteristics of a specific language
Language model can be trained from languageprints (*.lp), language prints archives (*.lpa), or from combination of both.

LID language model should not be confused with LID technological model, like L4, L3, XL3, etc. which refer to the LID technology generation.

Language pack – set of language models used for language identification

Adaptation types overview

  • Creating new language model from your own audio files, to add new language not supported out-of-the-box
    • at least 20 hours of audio is required, see requirements below
  • Enhancing existing language model by adding your own audio files to existing built-in language
    • at least 5 hours of audio is required, see requirements below
  • Creating custom language pack consisting of your chosen set of languages, both pre-trained or created from your audio files

Audio recordings requirements

  • Format: WAV, FLAC, RAW with linear coding 16bit/8bit, sampling rate 8kHz+
  • Wide variety of speakers (50+) of various age and gender is required, to ensure rich variety of “language sounds”
  • Only single language in the dataset
    • NOTE: mixing in a different language negatively affects the resulting recognition accuracy
  • Audio length: ideally between 1 and 5 minutes of speech signal
    • NOTE: it is not possible to train a language using just a few and long audio files (like 5 files, 1 hour each)
  • Acoustic channels should be as close as possible to channel of intended deployment

Adaptation in SPE 3.38 and newer

SPE 3.38 and newer include LID adaptation tasks in REST API, which makes the adaptation significantly easier than in previous versions.

Creating language model

Language model can be created from languageprints (*.lp) extracted from audio files, or from pre-trained language prints archives (*.lpa), or from combination of both.
Combination of both .lpa and .lp is used when enhancing existing language model – the .lpa is the existing language model and the .lps are created from your audio files.

  • Use GET /technologies/languageid/extractlp endpoint to extract languageprints from you audio files
  • Use POST /technologies/languageid/languagemodels/{name} endpoint to create new (yet empty) language model
  • Use POST /technologies/languageid/languagemodels/{name}/file endpoint to upload languageprint- or languageprint archive file to the language model
    • repeat this upload for all necessary files – e.g. when creating completely new language model from your own audio files, this would be hundreds or thousands of files (see audio requirements above)

More details are available in the REST API documentation: https://download.phonexia.com/docs/spe/#examples_languageid_create_lmodel

Creating language pack

  • Use GET /technologies/languageid/languagemodels endpoint to list language models available for creation of your language pack
  • Use POST /technologies/languageid/languagepacks/{name} endpoint to create your custom language pack

More details are available in the REST API documentation: https://download.phonexia.com/docs/spe/#examples_languageid_create_lpack

Privacy Preference Center

Necessary

Required cookies required for proper function of Word Press publication platform.

gdpr*, wordpress*,cf7*,wp-settings*,PHPSESSID

Analytics

We are using Google Analytic in Global Site Tag configuration for keeping site content optimized for great user experience. No personal data are sent.

_ga*,_gid