Tips for best SPE performance
Here are a few tips for maximizing the SPE performance in terms of speed, throughput and hardware utilization.
Turn off the results caching
By default, SPE is configured to cache file processing results in its database. This is to prevent re-processing files in case of repeated request for processing of the same file by the same technology, model, etc.
Assuming that you simply receive the processing results from the API and use/store them in your own application, you don’t need the results to be cached in the SPE database.
Therefore, the results caching can be turned off, effectively
- avoiding unnecessary database operations, i.e. reducing the disk activity
- possibly also avoiding database grow by writing data that might not never be deleted (see below about deleting/unregistering files)
To turn off results caching, set server.db.save_results = false
in SPE configuration file.
For more details, refer to Understand SPE database and Understand SPE configuration file articles.
Use file registration instead of upload
Depending on your overall workflow and/or design of your application, you may consider changing the way you put the recordings to SPE – from uploading files via POST /audiofile
endpoint to copying the files directly to SPE storage and registering them to SPE via POST /audiofile/registration
endpoint.
Copying the files on the filesystem level can be more effective, faster and save some machine resources.
When using this method, make sure that you don’t run into access rights issues, i.e. ensure that the process copying the files to the SPE storage creates the files with such access rights that the files are accessible by SPE process.
Don’t forget to delete / unregister files
No matter which way you put your recordings to SPE (see the previous section), always make sure to remove the recordings properly when you don’t need them anymore.
Proper housekeeping keeps your SPE storage clean from residual mess and prevents the SPE database to grow excessively.
Keeping the SPE database size reasonable avoids processing slowdowns caused by database operations – specifically when using SQLite, any write operation essentially creates a copy of the entire database file… and if the file has several MB or even a few GB, this takes some time even on a fast SSD… which may lead to several seconds or even tens of seconds delays in processing.
- either call
DELETE /audiofile
– this operation- deletes the audiofile database records
- deletes all related cached processing results (if caching is on, see above)
- physically deletes the file from SPE storage
- or call
DELETE /audiofile/registration
– this operation- deletes the audiofile database records
- deletes all related cached processing results (if caching is on, see above)
- does NOT physically delete the file, i.e. some external process should eventually ensure its deletion, to avoid cumulation of unneeded obsolete audiofiles in SPE storage
NOTE: If you process realtime streams and use the option to save the incoming stream audio to a file (even if only occasionally, like for troubleshooting purposes), keep in mind that the audiofile is created in SPE storage and as such is also registered in SPE database… so a proper housekeeping should be done in this case as well.
Use MariaDB instead of SQLite
Using MariaDB as SPE database is generally recommended for bigger processing loads. Unlike SQLite, MariaDB is high-performance and scalable database, designed to handle very high loads easily.
Since MariaDB’s physical storage is designed for high preformance, it does not suffer from delays caused by handling huge files like in the case of SQLite (see the previous section).
Try these SQLite tuning tricks
If you cannot or don’t want to use MariaDB for some reason and need/want to keep using SQLite, you can try these tuning tricks and see if it improves the overall SPE performace:
- Create a RAMdisk and put the database file on the RAMdisk
Hard to say how much gain this can get over the nowadays’ SSDs, but you can give it a try… The database file path is set usingserver.db.sqlite.data_source
option in SPE configuration file, see here. - Set the
temp_store
pragma toMEMORY
, to force creating the temporary tables and indices in memory, instead of using the default, which is a disk file. See more details here. - Set the
synchronous
pragma toNORMAL
or evenOFF
, to let SQLite be less strict in synchronizing (writing) the changes to the database file. These options speed up the database operations at the cost of possible database corruption in case of SPE or operating system crash, see more details here.
Do not process entire audio, but only part
You can significantly speed up the technologies like Language-, Speaker-, Gender ID or Age Estimation by processing only (relevant) parts of the input audio.
For example, instead of extracting a voiceprint or identifying a language from the entire 6-minute long phonecall, you can use only 1 minute, or even 30 seconds of speech… This is still more than enough speech for the technologies to give reliable results, and it dramatically speeds up the processing.
The simplest way to do that is to use the from_time
and to_time
parameters in REST API calls. However, note that these specify the amount of audio, not the amount of speech… In general, selecting like 1-minute long section of audio is likely to contain enough speech for the technologies to work reliably.
A more sophisticated way is to use Voice Activity Detection (VAD) to identify the sections of audio with speech – this technology is extremely fast, so it does not add too much processing overhead.
After getting the VAD results, you can then apply your own logic to choose the suitable part of audio – depending on the kind of audio in your use case, it could be something like “the longest part of speech within the second third of the call”, or something similar.