Understand SPE workers configuration
Worker is a working thread performing the actual files- or realtime streams processing in Speech Engine.
This article helps to understand the Speech Engine workers and provides information how to configure workers for optimal performance and server utilization.
Starting from SPE 3.51, new defaults in
settings/phxspe.properties make SPE to configure workers automatically according to local conditions (physical CPU cores, configured technologies) to ensure optimal performance and server utilization.
These new defaults make the content of this article below obsolete, however, we keep it here for those who still want to fine-tune the configuration manually.
The default workers configuration in
settings/phxspe.properties is as shown below – 8 workers for files processing and 8 workers for realtime streams processing.
These numbers mean the maximum number of simultaneously running tasks.
# Multithread settings server.n_workers = 8 server.n_realtime_workers = 8
Requests for additional file processing tasks are put in a queue and processed according their order and priorities.
Requests for additional stream processing tasks are refused with HTTP status 403 (the realtime nature of stream processing does not allow any queuing).
File processing can process data faster than realtime, which allows them to utilize 100% of a physical CPU core.
This means that for file processing technologies the number of workers should be set to a number of physical CPU cores in the server and there is no point configuring more workers.
Stream processing can process data at real time speed at maximum – no one can really speak faster than realtime 😉 – so a single physical CPU core can actually process multiple realtime tasks simultaneously, depending on how much faster than realtime a particular technology is (and also how much speech the audio contains).
This means that for stream processing technologies it makes sense to configure higher number of workers than physical CPU cores in the server.
Czech STT on stream is approx. 4 times faster than realtime, i.e. 1 CPU core can process 4 realtime streams simultaneously.
So a server with 8 CPU cores running only STT stream can be configured as follows:
- keep 1 core dedicated for operating system and SPE
- remaining 7 cores can handle 28 realtime workers (7 cores × 4 streams per core)
i.e. the realtime workers setting should be
server.n_realtime_workers = 28.
The number of initialized STT_STREAM technology instances (configured via
phxadmin --configure-tech) should be also set to
28. There is no point to initialize more than that since there won’t be more workers available, i.e. no more than 28 simultaneously running tasks anyway.
Sitting on top of these numbers is the number of slots for a particular technology in a license.
For example, a license with 16 slots for STT won’t allow to initialize more than 16 instances of STT, regardless of configured number of workers or number of technology instances.
So for optimal and full utilization of your license and server, make sure that all these numbers and settings play together and reflect your actual server HW configuration.
P.S. The term “CPU core” means a physical CPU core. Hyper-Threading does not bring any benefits here as our processing is highly optimized and can really fully utilize the physical core.