A: The language-prints do not depend on the current language pack used. You may use them for both training a new language pack and testing/comparing against an existing language pack.
The language-prints need to be compatible only with the model of LID used for language-print extraction.
A: The following is recommended:
For adding new language to language pack
- 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech)
- Only 1 language per record
For adapting the existing language model (discriminative training)
- 10+ hours of audio for each language
- May be done on customer site. May be done in Phonexia using anonymized data (= language-prints extracted from a .wav audio)