Search: process

63 results

Language Identification (LID)

…actually expected in your use case. This process of tailoring the language pack for particular needs is called language pack adaptation and is described in LID: Terminology and adaptation article. Example usages of custom language packs Law enforcement agency monitoring a network of criminals using only a particular set of languages can use the approach of keeping only languages expected…

Q: What to do with the ApplicationStartup: Unhandled exception: BsapiException error?

…range (0,91840). A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE….

Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)?

…seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can’t predict what will come in next seconds of the speech. When access to the whole recording is granted during off-line transcription, speech engine can correct result before it is printed out by taking into account also the subsequent…

Licensing (technical details)

…machine for processing (64-bit required) and make sure its OS, CPU, HDD and ETH configuration will not change over time (this is especially important in virtual or cloud environments) Create a TXT file containing the HW profile download the HW-GEN tool and run it (choose below, based on your operating system) Linux 64-bit: https://download.phonexia.com/utils/hw-gen64 (or in ZIP) Windows 64-bit: https://download.phonexia.com/utils/hw-gen64.exe…

Q: How to fix Error 1007: Unsupported audio format?

…%2 is for output file ffmpeg example: audio_converter.command = ffmpeg -loglevel warning -y -i %1 %2 # sox example: # audio_converter.command = sox %1 %2 Important note: By design and saving computing resources ‘audio converter’ is not used if INPUT file ends with the extension .wav. In that case you must pre-process the audio recording before uploading it to the…

SID: Speaker Identification: Results Enhancement

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system…

LID: Terminology and adaptation

…added to user ‘admin’. Then launch Speech Engine. If everything was done successfully, you should see the new language pack listed in response to GET /technologies/languageid/languagepacks REST query i.e. available for use in model=… parameter in GET /technologies/languageid REST queries listed in Language models pane in Phonexia Browser i.e. available for selection for processing by Language Identification in Browser …

Q: How do you calculate SNR in Speech Quality Estimation?

A: Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise…

Voice Inspector – Interpretation of results

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

…Evaluation Package Evaluation package (download page) is consisting of Phonexia Browser and Phonexia Speech Engine including all necessary technologies. 2. Data We prepared the dataset for your testing. Package contains data for speaker model creation and speaker spotting too. The process of testing is the same for the data set collected by the user himself. Dataset is available to download…

What is User configuration file and how to use it

…working state. User configuration files provide a way to override processing parameters without modifying original BSAPI configuration files. WARNING: Inappropriate configuration changes may cause serious issues! Make sure you really know what you are doing. User configuration file is a plain text file with the same name as main configuration file, with additional extension .usr. For example: Main configuration file…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

STT: What is Preferred Phrases feature and how to use it

…it can help in other applications, too – e.g. when transcribing domain-specific audios, the frequently used domain-specific phrases can be boosted. How preferred phrases work The picture below shows a simplified standard speech transcription process – the digitized speech signal spectrum is analyzed in the neural network acoustic model (which describes the pronunciations of a given language) and goes into…

Understand SPE audio converter

…following: it’s one of the natively supported formats – then SPE simply continues the processing it’s one of “internally recognized”, but not natively supported formats (e.g. MP3 audio) – then if converter is enabled, SPE tries to convert the file if converter is disabled, upload ends up with error it’s some “internally unrecognized” format – this causes error during format…

Video – Filtering and supporting technologies

MODULE 2: Filtering and supporting technologies (22 min) Common generic rules for CLI, REST and GUI Filtering, sorting, pre-/post-processing overview Speech Quality Estimation (SQE) in CLI, REST and GUI Voice Activity Detection (VAD) in CLI, REST and GUI Diarization (DIAR) in CLI, REST and GUI Age Estimation (AGE) in CLI, REST and GUI Denoiser (DENOISER) in CLI, REST and GUI…