For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset.
SID dataset (minimum requirements):
To measure SID performance precisely, it’s important to prepare evaluation recordings set very carefully.
The requirements are:
- 50+ known speakers, 200+ recordings in total (i.e. 3 to 5 recordings per speaker*)
- 1+ minute of net speech in each recording (i.e. usually 2+ minutes recording length)
- only one speaker in each recording
- wide variety of gender and age is recommended
- recordings should be as similar to the target use case as possible (device, channel, distance from mic, languages distribution)
- audio files should be mono, lin16 format, 8 kHz+ sample rate
*Note: splitting single recording into multiple shorter recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times.
In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results.
Warning: Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it’s very important to prepare the data carefully.
See SID Evaluation for more details