Search: results

43 results

Voice Inspector – supporting technologies

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

Keyword Spotting (KWS)

…Keyword Spotting. It’s a good idea to limit the start- and end time of Phoneme Recognizer transcription to only the time slot where the word or phrase of interest occurs. Thresholds Threshold is a numeric value from {0,1} interval, limiting the output results. Only words with confidence exceeding the threshold are returned as result. Command line implementation of Keyword Spotting…

Age Estimation (AGE)

…”AgeEstimationResult”, “file”: “/kelly_2.wav”, “model”: “L”, “channel_scores”: [ { “channel”: 0, “scores”: [ { “name”: “0”, “score”: }, { “name”: “1”, “score”: }, . . . { “name”: “41”, “score”: 1 }, { “name”: “42”, “score”: }, . . . In order to achieve the most representative results possible, a span of +/- 10 years should be added to the results….

Arabic dialects in Phonexia LID and STT

…code ar-XL, where the XL means “cross-Levantine” 😉 NOTE: To get the best STT results, use the model that corresponds to given dialect. The AR_XL_* model is best suitable for Levantine dialect recordings. When using AR_XL_* model for neighbor dialect, e.g. Iraqi, the results will be much worse… and for e.g. Maghrebi, the results will be most probably completely unusable….

Key Features (VIN)

…speakers) Supported audio format: MS Wave or RAW with linear coding (8 or 16 bits), A-law, Mu-law; Sampling frequency 8kHz or higher Output: A scoring table with the results of comparisons in a Likelihood Ratio, Log-Likelihood Ratio (decimal or natural logarithm), and Verbal Ratio The graphical presentation of results in the form of a Probability Density Function plot and a…

Time Analysis Extraction (TAE)

…operator speaks on one channel and caller on another. TAE can process also mono-channel recordings, but it provides limited set of results for dialogue statistic. When the technology is applied on a stream, the results are created and returned on every request, even during an ongoing stream. Output As with the whole SPE, results are provided in form of JSON…

Speech to Text (STT)

…n-grams. Using this the user can adjust a language model focusing on a specific domain to get better results. Result types During the process of transcribing the speech there are always several alternatives for a given speech segment. The technology can provide one or more results. 1-best result type provides only the result with highest score. Speech is returned in…

Language Identification (LID)

…Routing particular calls (languages) to human operators (language experts) Scoring and results The LID language pack defines a set of recognizable languages (represented by a language models). When identifying the language in audio recording (or languageprint), LID does the following: creates languageprint of the recording (if the input is audio recording) compares that languageprint with each language model in a…

SID4 performance on Intel® Xeon® Platinum 8124M

…only after the first Phonexia-controlled analysis and becomes the main part of the calculation for the precise capacity planning in the following stages. Results “Captured recordings” refers to archives of recording gathered by various methods. Typical one is recording archives created by call centres who must record business calls for long time period because of general country law environment. Law…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

Q: What are the requirements for SID evaluation dataset?

…unique recordings coming from different audio environments or even different times of the day, additional details can be analyzed leading to better results. Warning: Any human error in evaluation set preparation (in speaker uniqueness, placing recordings into wrong folder, etc.) affects the evaluation results, so it’s very important to prepare the data carefully. See SID Evaluation for more details…

Understand SPE executable files

…registered Windows service phxclient phxclient is simple command line SPE client. It is specifically designed for the SPE REST API, e.g. it automatically handles polling for asynchronous requests results. phxclient also provides additional functionality related to SPE features, like ability to stream audio recordings via RTP or HTTP stream. Therefore it’s useful for quick testing of the SPE API without…

Input audio quality

Quality of the audio is extremely important for satisfactory results of any speech processing technology, being it simple voice activity detection, speech transcription, voice biometry, or other. There are two main aspects of audio quality: technical quality of the audio data (format, codec, bitrate, SNR, …) sound quality of the actual content (background noise, reverberations, …) Technical quality Using inappropriate…

FAQs (PSP)

…SPE. in FAQ Speech Platform Permalink Q: How do I get results for a pending operation? A: If server responds on pending request by status 200 – OK, the body of the response will have the result inside (server already has the result in cache memory and there is no need to process the file again). If server responds on…

Documentation (VIN)

…Quick Start Guide Using the Application Interpretation of Results Description of Other Inbuilt Tools Troubleshooting In case of any problems or questions, please first search the manual for relevant keywords. If you don’t find an answer in the manual, contact our support. You can also browse the FAQ section or use the search function of the Partner Portal and look…