Search: speech%20to%20text

121 results

Speech Quality Estimation (SQE)

Phonexia’s Speech Quality Estimation quantifies the acoustic quality of recordings. This helps the user to quickly determine whether the acoustic quality of a recording is good for processing with other speech technologies or not. As an answer for SQE, the SPE returns a json/xml file. This file includes general information about the technology and statistics of all (one or two)…

LID: Terminology and adaptation

…Engine chapter for details. Using custom LID language pack in Speech Engine To use customized LID language pack in Speech Engine, it’s necessary to ensure that language pack placed in correct location, so that Speech Engine can find it register and enable the language pack in SPE using phxadmin 1) Put the language pack in correct location In order…

Understand SPE configuration file

In this article we explain details of the Speech Engine configuration file phxspe.properties, located in settings subdirectory in SPE installation location. Settings in this configuration file affect the Speech Engine behavior and performance. The configuration file is usually created after SPE installation – on first use of phxadmin, default configuration file phxspe.properties is created in the settings directory. The file…

Speaker Identification (SID)

…of data and 1:1 comparisons to evaluate evidence and to establish probability of the identity of a speaker and use it in court. How does it work? The technology is based on the fact that the speech organs and the speaking habits of every person are more or less unique. As a result, the characteristics (or features) of the speech…

STT: Results explained

This article aims on giving more details about Speech To Text outputs and hints on how to tailor Speech To Text to suit best your needs. In the process of transcribing speech, the Speech To Text technology usually identifies multiple alternatives for individual speech segments, as multiple phrases can have similar pronunciations, possibly with different word boundaries, e.g. “eight tea…

Q: How do you calculate SNR in Speech Quality Estimation?

A: Signal-to-Noise Ratio (SNR) is an important metric of whether a recording is worth further processing by other speech technologies, so it is part of our Speech Quality Estimation. However, calculating SNR automatically is not a trivial task. We use the fact that the statistical distribution of the frequencies in the waveform of speech has Gamma distribution. In contrast, noise…

Voice Inspector – supporting technologies

This part requires higher (and non-anonymous) access level.
How to solve this situation:

Log in here if you are not logged in.
Register here. It takes just a few clicks and it’s free.

STT: What is Preferred Phrases feature and how to use it

Preferred phrases is a feature, available for 5th or newer generation of STT models and Speech Engine 3.32 or later. This article explains what is the feature good for, how does it work internally and gives some tips for practical implementation. What are preferred phrases In the speech transcription tasks, there may be situations where similarly sounding words get confused,…

Q: What is the difference between on-the-fly and off-line type of speech to text transcription (STT)?

…seconds of speech at the beginning of recordings. As the output is requested immediately during processing of the audio, recording engine can’t predict what will come in next seconds of the speech. When access to the whole recording is granted during off-line transcription, speech engine can correct result before it is printed out by taking into account also the subsequent…

Phonexia technologies introduction

Core objective: Basic understanding of Phonexia speech technologies and products; typical use cases, implementations and deployment topologies Duration: 35 minutes intended for idea makers and product designers assumes generic knowledge of Phonexia and speech technologies in general Content 00:00 Introduction What information can we get from speech? Overview of basic use cases Phonexia Speech Platform brief 4:21 Phonexia technologies overview…

Phoneme Recogniser (PHNREC)

…user can add to language model of speech-to-text technology (better accuracy of KWS technology). Input audio file (format details – see Speech Engine documentation); stream not supported, technology model name (i.e. language code) to be used for phoneme transcription. Output In the process of transcribing speech-to-phonemes, the Phoneme Recogniser usually identifies individual speech segments and convert it to pronunciation. Example…

Understand SPE technologies configuration file

…SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

Speech Engine update

…of software and/or API (for example REST Server 2.1 -> SPE 3.0). It includes change in components or technology models. Speech Engine update procedure The update procedure is purely manual and heavily relies on your own detailed knowledge of your Speech Engine installation and its internal functionality and structures. This knowledge is crucial for tuning the Speech Engine for maximum…

Sizing of the computing units for speech technologies

…VT features can’t help in performance) Also seek for CPUs with a large L3 cache. And the better CPUs are those with higher l3_cache_size/#_of_physical_CPU_cores ratio. We currently assume that CPUs from the current Intel Xeon Family in the 4th generation are the best. For small computation tasks, i7 family CPUs also have reasonable price/performance ratio) Big challenge: correct SPE3/Speech platform…

Releases and Changelogs (VIN)

…Target score distribution Fixed: Population Set selected correctly even if renamed in the selection window Improved: Speech length display in the case view: added “Unlimited” option to display the speech length permanently Improved: SID Evidence score aligned with Speech Engine output of SID score Removed: Speech length compensation Voice Inspector 5.1 Voice Inspector 5.1.0, BSAPI 3.60.0 (2023-12-07) New: A generalized…