Search Results for: mean normalization

Results 1 - 12 of 12Page 1 of 1
Results per-page: 10 | 20 | 50 | 100

Speaker Identification: Results Enhancement

Relevance: 100%      Posted on: 2019-05-29

Speaker Identification (SID) Results Enhancement is a process that adjusts the score threshold for detecting/rejecting speakers by removing the effect of speech length and audio quality. This is achieved by use of Audio Source Profiles, that represent as closely as possible the source of the speech recording (device, acoustic channel, distance from microphone, language, gender, etc.). Although the out-of-the-box system is robust in such factors, several result enhancement procedures can provide even better results and stronger evidence. Audio Source Profile An Audio Source Profile is a representation of the speech source, e.g., device, acoustic channel, distance from microphone, language, gender,…

Q: I found the following error: ApplicationStartup: Unhandled exception: BsapiException. What does it mean?

Relevance: 79%      Posted on: 2017-06-27

[Error] ApplicationStartup: Unhandled exception: BsapiException: SWaveformSegmenterI(/mnt/phxspe/home/phx/storage/dfs/a1cabcf7-c761-49f1 -a9bc-0a8209a09fd9.opus Requested segment (78056, 102056) is out of waveform range (0,91840). Any ideas what this means? A: It means that this opus file is created improperly and declares internally (in header) much more audio than available in real file. Please check your audio source/originator for proper functionality. Or use ffmpeg / sox utility as preprocessor of the audio and do audio normalization by self-conversion from opus to opus before recordings are processed through SPE.

Q: What LLR, LR and score mean?

Relevance: 78%      Posted on: 2017-06-27

A: These abbreviations mean the following: LR - likelihood ratio, result from statistical test for two models comparison. It returns a number which expresses how many times more likely the data are under one model than the other.  LR meets numbers in interval <0;+inf). LLR - abbreviation for log-likelihood ratio statistic, logarithmic function of LR. LLR meets numbers in interval (-inf;+inf). Percentage (normalised) score - commonly used mathematical transformation of the LLR to percentage. This number is better for human readability but may bring some doubts if LLR numbers are too high (typically for some non-adapted installations). Interval <0;100> (or…

Speaker Identification (SID)

Relevance: 23%      Posted on: 2019-06-13

Phonexia Speaker Identification uses the power of voice biometry to recognize speakers by their voice... i.e. to decide whether the voice in two recordings belongs to the same person or two different people. High accuracy of Speaker Identification, the Phonexia's flagship technology, has been validated in a NIST Speaker Recognition Evaluations. Basic use cases and application areas The technology can be used for various speaker recognition tasks. One basic distinction is based on the kind of question we want to answer. Speaker Identification is the case when we are asking "Whose voice is this?", such as in fake emergency calls.…

Voice Inspector – Interpretation of results

Relevance: 12%      Posted on: 2019-06-24

Introduction Phonexia Voice Inspector (VIN) is a tool for forensic automatic speaker identification, compliant with the Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition, published by the European Network of Forensic Science Institutes.  This post explains individual SID score types and ways to visualize the results in a speaker identification case implemented in Voice Inspector. Evidence In VIN, the term evidence has two meanings. In general, it refers to any SID score that the system calculates for any pair of recordings in the case. These scores are the output of the Phonexia SID technology which runs…

SPE3 – Releases and Changelogs

Relevance: 9%      Posted on: 2021-04-16

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). Releases Changelogs Speech Engine 3.40.1, DB v1700, BSAPI 3.40.1 (2021-04-16) Public release Fixed: 6th generation STT/KWS stream result may start with words from end of previous stream Fixed: Some licensing error messages are not shown in log Fixed: Missing file names in log messages in SID and SID4 tasks Fixed: Keyword list may not work if XML is used as input and optional fields threshold or pronunciations are used Fixed: phxdamin2…

MAE

Relevance: 8%      Posted on: 2018-02-01

Mean Absolute Error – In statistics, the mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.

How to configure STT realtime stream word detection parameters

Relevance: 8%      Posted on: 2020-03-28

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part of the signal going to the decoder. Decoder is a component, which determines what a particular part of the signal contains (speech, silence, etc.). Based on that, decoder also decides whether segment has finished or not. Unlike in file processing…

How to configure Speech Engine workers

Relevance: 4%      Posted on: 2020-03-28

Worker is a working thread performing the actual files- or realtime streams processing in Speech Engine. This article helps to understand the Speech Engine workers and provides information how to configure workers for optimal performance and server utilization. The default workers configuration in settings/phxspe.properties is as shown below – 8 workers for files processing and 8 workers for realtime streams processing. These numbers mean the maximum number of simultaneously running tasks. # Multithread settings server.n_workers = 8 server.n_realtime_workers = 8 Requests for additional file processing tasks are put in a queue and processed according their order and priorities. Requests for…

Privacy Policy

Relevance: 4%      Posted on: 2018-03-24

Phonexia s.r.o. with registered seat at Chaloupkova 3002/1a, 612 00 Brno, Czech Republic, is a developer and provider of speech technologies software products and related services. We appreciate your visit on our websites and we are pleased that you are interested in our software products and related services. We conform our data use to the European Union’s (“EU”) General Data Protection Regulation (“GDPR”). This Privacy Policy should help you to understand how we as a data controller gather, use and protect your personal information. 1. COLLECTING PERSONAL INFORMATION When you sign up for a Phonexia Account to allow you using…

Terms of Service

Relevance: 4%      Posted on: 2018-03-24

Description of the Services provided by Phonexia s.r.o. 1. Acceptance of Terms of Service (Terms as a Contract) 1.1. PHONEXIA-User Relationship. These Terms of Service (hereinafter referred to as "Agreement" or „Terms of Service“) and the PHONEXIA Privacy Policy govern the relationship between Phonexia s.r.o. (ID No.: 27680258, VAT No.: CZ27680258, registred seat at: Chaloupkova 3002/1a, 61200 Brno, registred by the County Court in Brno under file C, insert 5124), provider of the PHONEXIA technology (hereinafter referred to as "PHONEXIA") and you ("you", "your", „user“ or "Member"), and your use of and access to the website, PHONEXIA services or any…

Speech Quality Estimator – Essential

Relevance: 4%      Posted on: 2018-04-04

Phonexia’s Speech Quality Estimator quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two) channels. The statistics of all channels include the numbers for many aspects of recording quality, and the overall global score. Technology The technology is language-, accent-, text-, and channel- independent Compatibility with the widest range of audio sources possible (applies…