Search Results for: file formats

Results 1 - 10 of 58 Page 1 of 6
Results per-page: 10 | 20 | 50 | 100

Supported audio formats

Relevance: 100%      Posted on: 2018-12-10

Supported audio format are: WAVE (*.wav) container including any of: unsigned 8-bit PCM (u8) unsigned 16-bit PCM (u16le) IEEE float 32-bit (f32le) A-law (alaw) µ-law (mulaw) ADPCM FLAC codec inside FLAC (*.flac) container OPUS codec inside OGG (*.opus) container   Other audio formats must be converted using external tools. SPE server can be configured to support automated conversion on background, see SPE configuration hints. Great tools for converting other than supported formats to supported are ffmpeg (http://www.ffmpeg.org) or SoX (http://sox.sourceforge.net/). Both are multiplatform software tools for MS Windows, Linux and Apple OS X. Example of usage: ffmpeg ffmpeg -i <source_audio_file_name>…

SPE configuration

Relevance: 94%      Posted on: 2018-02-02

Basic explanation of configuration directives for SPE with hints & tips. Overview of phxspe.properties for beginners.

What is a user configuration file and how to use it

Relevance: 76%      Posted on: 2020-03-28

Advanced users with appropriate knowledge (gained e.g. by taking the Phonexia Academy Advanced Training) may want to finetune behavior of the technologies to adapt to the nature of their audio data. Modifying original BSAPI configuration files directly can be dangerous – inappropriate changes may cause unpredicatble behavior and without having a backup of the unmodified file it's difficult to restore working state. User configuration files provide a way to override processing parameters without modifying original BSAPI configuration files. WARNING: Inappropriate configuration changes may cause serious issues! Make sure you really know what you are doing. User configuration file is a…

SPE3 – Releases and Changelogs

Relevance: 65%      Posted on: 2020-12-14

Speech Engine (SPE) is developed as RESTfull API on top of Phonexia BSAPI. SPE was formerly known as BSAPI-rest (up to v2.x) or as Phonexia Server (up to v3.2.x). This page lists changes in SPE releases. Releases Changelogs Speech Engine 3.35.4, DB v1601, BSAPI 3.35.4 (2020-12-14) Public release Fixed: STT/KWS model AR_XL_5 has incorrect name and does not start Fixed: Missing KWS model AR_XL_5 Fixed: Processing of some short recordings causes TwoGmmCalibThreshold is not finite error Fixed: STT preferred phrases "out of vocabulary" (OOV) warning message is now more verbose Speech Engine 3.36.0, DB v1601, BSAPI 3.35.3 (2020-12-01) Non-public Feature…

Licensing (technical details)

Relevance: 61%      Posted on: 2018-03-02

This document describes all licensing types for Phonexia product licensing available to our partners and customers. Each partner/customer can choose the licensing variant which best fits the current project or infrastructure. The document does not describe business conditions of Phonexia licensing. What is the License? The License is a formal agreement regarding “The Product Usage Rights” between Phonexia s.r.o. and a user of any Phonexia technology or Phonexia product. Licenses are issued by the Business Department for all speech technologies and products, and may be required in order to use utilities and tools developed by Phonexia or partners. For technical…

Q: How can I tell in which format the .wav file is?

Relevance: 55%      Posted on: 2017-06-27

A: From the utilities in the package, you can find it in "ffprobe <file_name>", it will write out the info about the file. *Utility "ffprobe" is not included in our package(s). It is part of ffmpeg (https://ffmpeg.org/ffprobe.html) and it is neccessary to install it separately.

Speech Intelligence Resolver v1

Relevance: 23%      Posted on: 2017-05-18

About Phonexia Speech Intelligence Resolver v1 (SIR1) combines the power of speech technologies within a single application. The application automatically performs visualization of the record as well as filtering the speech metadata uncovered from your records effectively. Speech technologies implemented: Phonexia Speaker Identification (SID2) Phonexia Language Identification (LID2) Phonexia Gender identification (GID) Phonexia Voice Activity Detection (VAD) Phonexia Speaker Diarization (DIAR) Phonexia Keyword Spotting (KWS) Phonexia Speech Quality Estimator (SQE) Phonexia Speech Transcription (STT) SIR is a client application cooperating with REST servers. It can be used as a standalone application due to the integrated local REST server. It was…

Browser3 – Releases and Changelogs

Relevance: 21%      Posted on: 2020-10-23

Phonexia Browser v3 (Browser3) is developed as client on top of Phonexia Speech Engine v3. Phonexia Browser is a successor of Phonexia Speech Intelligence Resolver v1 (SIR1). This page lists changes in Browser releases. Releases Changelogs Phonexia Browser v3.35.2, BSAPI 3.35.2 - Oct 21 2020 Public release Fixed: Speaker identification dialog in WaveEditor which did not work for SID4 Fixed detection of certain USB license tokens Phonexia Browser v3.35.0, BSAPI 3.35.0 - Oct 02 2020 Public release New: Compatibility with SPE 3.35 Phonexia Browser v3.30.12, BSAPI 3.30.11 - Aug 20 2020 Public release Fixed: Transcription results intermittently displays words in wrong…

Language Identification (LID)

Relevance: 21%      Posted on: 2020-07-09

Phonexia Language Identification (LID) will help you distinguish the spoken language or dialect. It will enable your system to automatically route valuable calls to your experts in the given language or to send them to other software for analysis. Phonexia uses state-of-the-art language identification (LID) technology based on iVectors that were introduced by NIST (National Institute of Standards and Technology, USA) during the 2010 evaluations. The technology is independent on text and channel. This highly accurate technology uses the power of voice biometrics to automatically recognize spoken language. Application areas Preselecting multilingual sources and routing audio streams/files to language dependent…

STT Language Model Customization tutorial

Relevance: 19%      Posted on: 2019-04-24

Language Model Customization tool (LMC) provides a way to improve the Speech To Text performance by creating customized language model. Language model is an important part of Phonexia Speech To Text. In a simplified way it can be imagined as a large dictionary with multiple statistics. The Speech To Text technology uses this dictionary and statistical model to convert audio signals into the proper text equivalents. Due to general diversity of spoken speech, the default generic language model may not acknowledge the importance of certain words over other words in certain situations. Language model customization is a way to inform the…