Skip to content Skip to main navigation Skip to footer

Search: Configuración del servidor

64 results

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

Phonexia Speaker Identification is a voice biometry tool for recognition of speakers by their voice. In this video, we will show you how to start using this technology! You will learn how to create a “Speaker Model” to identify a speaker in a set of data. Ready to test it? Start with our video: What else is needed? 1. Phonexia…

SID4 performance on Intel® Xeon® Platinum 8124M

…enforcement agencies might use different methods gathering recording, but the principle is very similar. Based on data measured on data set described above we can see this conclusion for Intel® Xeon® Platinum 8124M: Phonexia SID4 using L4 model can perform up to 180 FTRT using 1 physical CPU core when processing audio data containing 44% of speech Optimal system performance…

STT: How to properly convert Confusion Network results to One-best

Confusion Network output is the most detailed Speech Engine STT output as it provides multiple word alternatives for individual timeslots of processed speech signal. Therefore many applications want use it as the main source of speech transcription and perform eventual conversion to less verbose output formats internally. This article provides the recommended way to do the conversion. Time slots and…

STT: What is Words-To-Numbers feature and how to use it

…that would require to retroactively change text which was already outputted earlier… which is impossible. Alternatively, the output would have to be somehow delayed… which is undesirable in realtime stream processing, of course. So, the best compromise is to keep the word-level outputs untouched and do the conversion only on the segment/sentence level. How does it work? The words to…

Waveform Denoiser (DENOISER)

Phonexia Waveform Denoiser (DENOISER) ensures automatic dereverberation (removal of echoes caused by sound in the rooms) and automatic noise reduction of the speech signal. The data model is usually trained for various types of noise using the latest generation of algorithms based on neural networks. Automatically removed are mainly noises similar to those that was software trained on. Conversely, the…

Phonexia Academy

About Main idea of the Phonexia Academy is to help partners to understand the market, Phonexia’s products and technologies. Sell more, deliver your projects on time and at the highest quality, and support your clients effectively. We provide following trainings: Phonexia technologies introduction (online video course) Technical Training Essentials (online video course) Technical Training Advanced – 2 courses: Voice Biometrics…

Keyword Spotting (KWS)

…takes some time – the more pronunciationless keywords in the list, the longer delay occurs before the processing. When keyword list has pronunciations defined for each keyword, even thousands of defined keywords have no impact on performance. Technology searches the recording and returns the list of found keywords, together with score and confidence for each found keyword. The score is…

Q: What do LLR, LR and score mean?

A: These abbreviations mean the following: LR – likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other. LR meets numbers in interval <0;+inf). LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval…

Licensing (technical details)

…dependency on Internet connection – there might be various obstacles in different customer environments (firewalls, air-gapped rooms, etc.)   USB license USB license is specified by USE_USB line in the license file (the SERVER line is ignored in this case, no Internet connection is required). This license is bound to a physical USB token delivered by Phonexia. To successfully use…

SID: Speaker Identification: Results Enhancement

…– recordings from different speakers representing the source data, minimum 60 seconds net speech in each. The set must not contain duplicates or target speaker recordings. With FAR Calibration, the system is calibrated to a specific False Acceptance Rate (e.g., FAR = 1%) for each reference voiceprint (speaker model). Only one side (the enroll) is calibrated, using data representing the…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

FAQs (Browser)

…are under one model than the other. LR meets numbers in interval <0;+inf). LLR – abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf). Percentage (normalised) score – commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too…

Support

…latest version of Phonexia’s software, it will not be considered a Critical Issue. A Critical Issue is being fixed on a best-effort basis and a fixed version of the Product is delivered within the next Bug fix release or Software Update. Permalink Minor Issue Any scenario that does not fall under the Critical or Severe Issue definitions above. The Product…