Skip to content Skip to main navigation Skip to footer

SID4 performance on Intel® Xeon® Platinum 8124M

Benchmark goals

  • Find realistic performance using total recording length
  • Find FTRT based exactly on net_speech (engineering sizing data)
  • Find system performance using all physical cores
  • Find system performance using all logical cores

Infrastructure setup

Intel® Xeon® Platinum 8124M is used in virtual machine with 8 physical cores reserved exclusively for this VM, Hyper Threading is enabled [16 logical cores available], 32GB RAM, 30GB SSD based storage, 1000 I/O.s-1  reserved per core

Benchmark data setup

Data set statistic:

  • Number of files: 32 [300 seconds each]
  • RAW recordings total length: 9600 seconds
  • Net speech total length: 4224.77 secons
  • Data set contains 44% of speech signal, 56% of silence or technical signal
  • Statistic counted by Phonexia VAD 3.22.1, “” settings (AKA strict VAD, w/o speech context)


SID4 performance was measured on a virtual machine, Ubuntu 18.04 installed as host OS.
SID4 v 3.21.3 command line was used, supported by VAD 3.22.1 command line used for collecting statistical metadata.

The Virtual Machine was reserved only for this measurement experiment.

Technical details:

  • Driven by bash script in terminal emulator
  • Measuring script was run 50 times for each number of used cores (physical and virtual)
  • Collected data are saved in CSV file
  • FTRT numbers are calculated as median from collected measurements.
  • Total system performance is simple multiplication of computed FTRT equivalent.


Understanding of the methodology:
At the beginning, our Customers can usually only refer to captured recordings data set during a specified time period with numbers like:

  • total number of recordings
  • average file size of captured recording
  • or total number of captured hours

Customers usually don’t have any information about ratio between speech signal and technical/silence parts of recordings in the beginning.

The speech / non-speech ratio is detected only after the first Phonexia-controlled analysis and becomes the main part of the calculation for the precise capacity planning in the following stages.


“Captured recordings” refers to archives of recording gathered by various methods. Typical one is recording archives created by call centres who must record business calls for long time period because of general country law environment. Law enforcement agencies might use different methods gathering recording, but the principle is very similar.

Based on data measured on data set described above we can see this conclusion for Intel® Xeon® Platinum 8124M:

  1. Phonexia SID4 using L4 model can perform up to 180 FTRT using 1 physical CPU core when processing audio data containing 44% of speech
  2. Optimal system performance was detected with 8 SID4 instances using 8 physical CPU cores on a single CPU
  3. Under those conditions we can measure total system performance 1200 FTRT when single CPU is used
  4. CPU Hyper Threading feature doesn’t bring any performance improvement for SID4

The following data visualization shows performance of the Phonexia SID4 on specific CPU family and its type. Explanation and how to understand this data is below each chart. RAW data collected during measurement are added in Appendix 1



  • Green line [FTRT based on recording length] shows how the system performs on the specified data set. This line shows most realistic performance in a system where only total number of captured recordings in hours is known.
  • Orange line [FTRT based on net speech] demonstrates system performance based only on “net speech”. In other words, it shows the situation when 100% of the recordings’ duration contains speech (or utterance). This metric is an exact engineering approach, it doesn’t exactly reflect real world.
  • Orange bar, CPU core, shows how many physical cores are available on tested system
  • Blue bar, SID4 instances, shows how many parallel SID4 processes were initiated in parallel.


  • X-axis shows how many SID4 instances were activated in parallel processing
  • Blue bar shows total performance based on RAW recordings length in data set
  • Orange bar shows recalculated performance based on “Net_Speech” length calculated from original recordings in data set.

How to understand the results context

As we can see above, it’s clear that this measurement shows that the performance curve is nicely raising up to 8 SID4 instances which run on 8 physical cores. There is visible small performance drop between configuration where 1 SID4 instance runs on 8 physical cores until 8 SID4 instances run on 8 physical cores, displayed in Figure 2.


When 8 physical cores are used, benchmark test starts to initiate more SID4 processes with a hope that Hyper-Threading may help.

This hypothesis unfortunately can’t be confirmed.  See Figure 2 which clearly shows that initiating more SID4 processes than the number of available physical cores simply doesn’t provide better performance than reserved physical core count is enabled in virtual machine.

With Phonexia SID4, Hyper-Threading does not bring any advantage on the hardware configuration shown above, because Phonexia technology can utilize the whole physical capacity of the given CPU in its physical cores. Thus, Hyper-Threading, as a part of CPU virtualization, can’t deliver better performance based on this parallel computation paradigm.


Related Articles