Search: Time%20%20%20%20%20ysis%20Extractor

75 results

Contact

Visit Us at Address: Chaloupkova 3002/1a, CZ 612 00 Brno, Czech Republic, European Union GPS: N 49° 13.426′, E 016° 35.898 General Queries and Sales [email protected] landline: +420 511 205 265 Company registration details Identification number (ICO): 27680258 VAT identification (DIC): CZ27680258 Registered in the Business Register kept at the District Court in Brno, File C, Inset 51524….

Language Identification (LID)

…less likely, but still possible to appear. This can reduce the number of scored languages from ~80 languages (included in the default out-of-the-box language pack) to like 20 or even less languages. In both cases, limiting the number of languages in a language pack results in the scores being distributed among less languages, i.e. the score values getting higher with…

Installation of Phonexia Browser

Some packages are distributed with only a limited set of speech technologies and languages or without speech technologies. First installation Our software is distributed as a ZIP file. Installation procedure is as simple as: unzip the archive paste additional KWS, STT… models paste the license.dat file to the root directory where you have BROWSER folder and run_browser(.exe) script run the…

SID: TUTORIAL: Speaker Identification – How to Do a Basic Test

…to download for commercial/research purposes under a Creative Commons 4.0 license. Data originates from OXFORD VGG VoxCeleb Dataset which detailed license can be found here. SpeakerID Example Data Set v1.0 83.89 MB Download Publications: S. Chung, A. Nagrani, A. Zisserman VoxCeleb2: Deep Speaker Recognition INTERSPEECH, 2018. Nagrani, J. S. Chung, A. Zisserman VoxCeleb: a large-scale speaker identification dataset INTERSPEECH, 2017….

Understand SPE directory structure

…on which technologies are included in the particular SPE installation. For testing and first-time evaluation we usually include the full set of technologies, other installations may contain only limited subset. Location of bsapi directory can be modified using bsapi.path option in SPE configuration file. This might be useful in complex network infrastructure, for sharing technologies between multiple SPEs, and similar…

Understand SPE connectors for external TTS

…capabilities of the TTS service is not a good idea as it might potentially get incorrect over the time, leading to obscure issues in the application relying on the info. Required capabilities information JSON structure: { “apiVersion”: 2, “vendor”: string, “author”: string, “version”: string, “voices”: [ { “name”: string, “languageCodes”: [string, string, …], “naturalSampleRateHertz”: number }, . . . ]…

Understand SPE multithreaded technologies initialization

The server.technology_multithread_initialization setting in SPE configuration allows SPE to initialize instances of technologies during startup using multiple parallel threads. Default setting is OFF, i.e. instances of technologies are initialized using single thread, one-by-one. This allows easier tracking of eventual issues during SPE startup and better readability of technologies initialization log messages (only single initialization happens at a time). The downside…

Q: I can’t manage to run Phonexia Browser software. I always get an error.

…happen if the initialization of SPE engine takes too long. Phonexia Browser software treats it as initialization failure and kills the server. You can fix this by doing the following: Increase timeout in Settings > Speech Engine tab > First connection timeout Use fewer instances of technologies, thus letting the Speech Engine to start faster Use smaller models of technologies…

Designing and Developing Application

Before designing and developing the application, we encourage Partner to find clear answer for the following questions: Customer requirements: Do my customers need file processing (audio) or stream processing in real time? What is the human power of the customer that can analyze the results? How many minutes per day or streams in parallel do my customer need to process?…

STT: Configuring word detection parameters for stream transcription

One of the improvements implemented since Speech Engine 3.24 is neural-network based VAD, used for word- and segment detection. This article describes the segmenter configuration parameters and how they are affecting the realtime stream STT results. The default segmenter parametrs are as shown below: [vad.online_segmenter:SOnlineVoiceActivitySegmenterI] backward_extensions_length_ms=150 forward_extensions_length_ms=750 speech_threshold=0.5 Backward- and forward extension are intervals in miliseconds, which extend the part…

What is User configuration file and how to use it

…example: When using Czech STT on realtime streams, the results show that system outputs end of segment too often, i.e. longer pauses between words made by the speakers are misidentified as end of sentence, while in fact the speakers actually continue to speak. So it is desired to finetune the system to accept longer delay between words without ending a…

Understand SPE technologies configuration file

…SQE_STREAM Speech Quality Estimation Stream STT Speech To Text STT_STREAM Speech To Text Stream TAE Time Analysis Extraction TAE_STREAM Time Analysis Extraction Stream VAD Voice Activity Detection VAD_STREAM Voice Activity Detection Stream SIDC Speaker Identification Voiceprint Comparator (legacy) SIDC_STREAM Speaker Identification Voiceprint Stream Comparator (legacy) SIDCALIBSET Speaker Identification VoicePrint Calibration (legacy) SIDCALIBSET_STREAM Speaker Identification VoicePrint Stream Calibration (legacy) SIDE Speaker…

Q: What are the recommendations for LID adaptation set?

A: The following is recommended: For adding new language to language pack 20+ hours of audio for each new language model (or 25+ hours of audio containing 80% of speech) Only 1 language per record For adapting the existing language model (discriminative training) 10+ hours of audio for each language May be done on customer site. May be done in…

Q: What languages do you offer?

It depends on the technology. Phonexia Language Identification (LID) is pre-trained for 60+ languages. Phonexia Keyword Spotting (KWS) and Phonexia Speech Transcription (STT) for 20+ languages including English, French, German, Russian, Spanish and many more….

STT: Language Model Customization tutorial

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio…