Skip to content Skip to main navigation Skip to footer

Understanding SPE metafiles

Certain SPE entities – SID Speaker models, SID Audio source profiles, LID Language packs – can have additional information associated with them in the form of “metafiles”. This article explains the intended usage of metafiles.

In general, SPE is intended as under-the-hood engine, focusing purely on the speech-related audio processing. Any additional functionality should be done on the application layer, i.e. should be handled by the application built on top of the SPE API.
This includes handling of any metadata associated with the processed audiofiles, like phone numbers, source of the recording, date/time the audio was recorded, references to the persons speaking in the recording (names, photos, …), languages spoken in the recording, etc. – all this data is expected to be stored in some sort of database managed by that application.

But if you want to create just some very simple application, adding the database may be an undesired complication… and the simple option to handle metadata directly in SPE may come in handy.

The ...../metafile endpoint allows to manage metadata directly in SPE – use POST, GET or DELETE methods to upload, download or delete any kind of file with metadata of your choice, associated with the corresponding SPE entity.
There are no limits on the content of the metafiles, their names, etc. (apart from those imposed by the underlying operating system and/or filesystem). Plain text files, structured formats like JSON or XML, pictures, documents, multimedia files… you can store whichever type of data would help your application.

The files are physically stored in the SPE user’s “home”, in data subdirectory (see Understanding SPE home directory article for details). Maximum size of single metafile can be set using server.max_metadata_size setting in SPE cofiguration file.

Example

As an example, the picture below shows how Phonexia Browser uses SPE metafiles for storing SID speaker model metadata – textual properties like name, date of birth, etc. are stored as JSON file (note: the structure and meaning is defined and understood by Browser itself, not by SPE), speaker photo and any other files attached to the speaker model are stored as separate files.

Another example would be the information about content of created LID language pack – if LID language pack is successfully created, SPE creates a metafile named report, which contains detailed information about the source files used for the language pack creation. See the LID language pack creation REST endpoint documentation for more details about the report metafile content.