For evaluating the real life scenario of Phonexia Speaker Identification technology, the system needs to be calibrated by SID dataset.
SID dataset (minimum requirements):
- 500 speakers
- >5 individual recordings per speaker*
- >30s per recording (>20s speech on each recording)
- speaker labels
- 1 speaker per channel
- phone or mobile phone source
- spontaneous dialogue (better than scripted or read text)
- wav, opus, flac audio format – for best results use only the natively supported audio formats – see list of supported audio formats
- diversity of age, gender, time of the day
*Note: splitting single recording into multiple shorter recordings in order to meet the criteria of at least 5 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times. In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results.