Skip to content Skip to main navigation Skip to footer

Search: real-time

34 results

STT: What is Preferred Phrases feature and how to use it

…from the preferred phrases and interpolate it in realtime with the generic language model: P(word|history) = Pgeneric(word|history) + αPpreferred(word|history) The preferred words and phrases are favored, while retaining the existing accuracy on common text. Preferred phrases in Speech Engine Use POST /technologies/stt or POST /technologies/stt/input_stream call to start transcription with a list of preferred phrases. To be precise, these actually…

Understand SPE user accounts

…prioritization section in the REST API documentation maximum pending requests – legacy REST Server 2.x attribute, ignored in SPE 3.x It’s important to realize that each SPE user account has its own home directory, where SPE stores the account’s data, see Understanding SPE home directory article. It means that by default the accounts’ data is isolated from each other. Therefore,…

STT: What is Words-To-Numbers feature and how to use it

…that would require to retroactively change text which was already outputted earlier… which is impossible. Alternatively, the output would have to be somehow delayed… which is undesirable in realtime stream processing, of course. So, the best compromise is to keep the word-level outputs untouched and do the conversion only on the segment/sentence level. How does it work? The words to…

Q: What are the requirements for SID evaluation dataset?

…recordings in order to meet the criteria of at least 3 recordings for each speaker is not the right way to proceed. This way you are not adding any details. You are essentially analyzing details of a single recording five times. In contrast, by using 5 unique recordings coming from different audio environments or even different times of the day,…

Licensing (technical details)

…section for details about where to put the license file) Start Phonexia FLS After starting FLS, start the computing machines FLS deployment examples Basic Phonexia FLS connection scheme Basic configuration makes it possible to distribute license files dynamically to computing machines based on real time and capacity needs. Figure: Example of a basic Phonexia FLS connection scheme/topology   Advanced Phonexia…

Designing and Developing Application

Before designing and developing the application, we encourage Partner to find clear answer for the following questions: Customer requirements: Do my customers need file processing (audio) or stream processing in real time? What is the human power of the customer that can analyze the results? How many minutes per day or streams in parallel do my customer need to process?…

Phonexia Partner Program for Government Partners

…the Starter Kit during the onboarding period? Yes, the Starter Kit can be purchased anytime during our cooperation. Can I purchase the Starter Kit multiple times? Yes, for each project, proof of concept, and product line, you can purchase a Starter Kit again. Phonexia consultants can’t wait to support your business. How do you deliver technical training? Phonexia technical training…

Understand SPE database scripts

…for SPE database content update As the SPE evolves and new features come or functionality gets improved, the database structure needs to change from time to time. So, when updating from older SPE version to newer version, the database content needs to be updated. Therefore, the database structure is versioned – database version is listed in SPE changelog together with…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

Understand SPE connectors for external TTS

…little-endian mono audio data. In SPE 3.46 and newer, the audio sampling frequency must be set to the naturalSampleRateHertz value provided in the TTS service capabilities information. In SPE 3.45 and older, the audio sampling frequency must be fixed to 8000 Hz. SPE then reads the audio and writes it either to a file, or to an output realtime stream,…

Speaker Identification

…30 seconds 3,63 3,36 3,08 The above tables show the accuracy of an out-of-box solution measured for the enrollment-verification, time-length combinations. The numbers show the real-world use case, as the datasets are selected to be as close to real-world usage as possible. Accuracy can be further improved by calibration. During the development of the algorithms, Phonexia achieved the best results…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

Understand SPE technologies, instances and workers

…for. Staffing the post office should be then done accordingly – ideally, there should be enough workers to allow having all counter desks open all the time. File processing workers cannot process realtime streams, and vice versa. Configuration of Speech Engine workers should be then done accordingly – ideally, there should be enough workers of each type to allow processing…

Age Estimation (AGE)

…coding), A-law or Mu-law, PCM, 8kHz+ sampling Voiceprints: AGE L4 model supports SID4 L4 voiceprints; legacy AGE models support voiceprints created by AGE itself Output Log file with processed information (age estimate) Processing speed Approx. 20x faster than real-time processing on 1 CPU core i.e. standard 8 CPU core server processes 3,840 hours of audio in 1 day of computing…