Skip to content Skip to main navigation Skip to footer

Search: escore *100 formula

24 results

STT: Results explained

…: }, { “time_slot” : 3, “start_time” : 8150000, “end_time” : 10050000, “word” : “guess”, “posterior_probability” : 1, “channel” : }, { “time_slot” : 4, “start_time” : 10050000, “end_time” : 10250000, “word” : “_DELETE_”, “posterior_probability” : 1, “channel” : }, { “time_slot” : 4, “start_time” : 10050000, “end_time” : 10250000, “word” : “if”, “posterior_probability” : 0.000000000001097692047105086, “channel” : }, {…

Understand SPE configuration file

…on one of them, set the IP address here. stream.rtp.min_port, stream.rtp.max_port # Sets starting port for creating input RTP sessions # ! Note that ‘stream.rtp.min_port’ is deprecated since SPE 3.23.x # ! Note that ‘stream.rtp.max_port’ is deprecated since SPE 3.23.x input_stream.rtp.min_port = 10000 input_stream.rtp.max_port = 11000 Sets the port number range for creating incoming RTP listeners. Default is from 10000

Speaker Identification (SID)

…well-calibrated system, the score of 1000 means that the user can be 1000 times more sure that the speaker in the questioned recording is suspected speaker rather than someone else. Technically, it also means, that 1 out of 1000 speakers was incorrectly detected in the development set. Another reason for calibration is for the score to be independent of the…

Q: How to fix Error 1007: Unsupported audio format?

Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? . But if you need use as input audio recordings in other formats, you can configure SPE for audio automated conversion. As prerequisite install external tool for audio conversion. Recommend is…

SID: Speaker Identification: Results Enhancement

…User Calibration Mean Normalization Requirements: 100+ audio recordings from different speakers representing the source data, minimum 60 seconds net speech in each. Ideally, the set shouldn’t contain duplicates or target speaker recordings. Mean Normalization makes data coming from different domains comparable by compensating for the differences in channel, language etc. Mean Normalization is extremely lightweight and has little to no…

Recommended OS and HW (PSP)

…or 10th Gen Intel® Core Processor RAM: 16 GB Storage: 100 GB (depends on audio retention policy) SSD strongly recommended for superior performance over HDD Configuration includes: SID4 XL4, GID XL4, LID L4, AGE L4, VAD, SQE Transcription System, basic 100 hours/day package (***) files processing CPU: 8 physical cores, 1x Intel® Xeon E5-2640 v4 or similar or 10th Gen…

Understand SPE configuration

…of MySQL database connections at the time. Default is 32 # server.db.mysql.max_connections = 32 # Maximum size of in-memory cache for calibrated voice-prints of speaker models. Default is 100 # server.db.sid_model_calib_vp_cache_size = 100 Sizing of the system The selection of speech technologies and the number of instances per technology which are instantiated when starting the SPE is configured by the…

Understand SPE audio converter

…can’t be converted: Converter is disabled 2021-01-30 20:59:52 [Trace] ConverterSubsystem: Removed temporary file: C:\TMP\tmp11452aaaaaa 2021-01-30 20:59:52 [Error] Rest.Object.AudioFile: [RID=2] REST error: (1007) Unsupported audio format 2021-01-30 20:59:52 [Trace] Rest.Object.AudioFile: [RID=2] Response HTTP: 415 RESTError: 1007 JSON response (error) ===================== { “result” : { “version” : 2, “name” : “ErrorResult”, “code” : 1007, “message” : “(1007) Unsupported audio format” } }…

Language Identification (LID)

…LID score to percentage, use e score * 100 formula) LID adaptation (custom language packs) The scoring principle described above implies that score is distributed among all languages in a language pack. It means that every language has to score with non-zero value… i.e. that the scores may get diluted as they get spread among many languages. Additionally, if the…

KWS: Results explained

…the detected pronunciation. Start- and end time is in HTK units. 1 HTK unit is 100 nanoseconds, so dividing the times by 10000 gives the amount of milliseconds. Score is log likelihood ratio from {-inf,+inf} interval. Confidence is a probability from {0,1} interval. To convert it to percentage, multiply the confidence value by 100. Example This example of Keyword Spotting…

FAQs (PSP)

…Browser, FAQ Speech Platform Permalink Q: How to fix Error 1007: Unsupported audio format? Phonexia Browser application may return error “1007: Unsupported audio format” during uploading audio file. Please consider if your audio files are in Q: What are the supported audio formats? . But if you need use as input audio recordings in other formats, you can configure SPE…

FAQs (Browser)

…format in 16-bit PCM little-endian as it is the default system. For more parameters please check FFmpeg manual pages. SoX sox <source_audio_file_name> -b 16 <output_audio_base_name>.wav Number of bits defined by -b parameter must be specified. in FAQ Phonexia Browser, FAQ Speech Platform Permalink Q: How to fix Error 1007: Unsupported audio format? Phonexia Browser application may return error “1007: Unsupported…

Measuring of a software processing speed – what is the FtRT (Faster than Real Time)

…removed. This metric is useful for comparing technology performance on different hardware configuration, or comparing performance of the same type of technology produced by different vendors. Same recording with silence segments stripped and only speech segments kept in waveform Calculation formula is very simple and is the same for both use-cases: FtRT = audio_length[s] / processing_time[s] Example Original audio length…

Speech Quality Estimation (SQE)

…linear coding), A-law or Mu-law, PCM, 8kHz+ sampling Output global score – percentage expression of audio quality (range <0;100>), by default, the global score is calculated based on waveform_n_bits and waveform_snr variables. pesq – value inspired by PESQ (Perceptual Evaluation of Speech Quality). Value is -0.5 to 4.5, the higher rating, the better quality of the recording. Other important statistics