Bertin IT Introduces MediaSpeech v6, Its Latest Multilingual Speech Recognition Solution
05 de junio de 2019 (13:22 h.)
PARIS--(BUSINESS WIRE)--Bertin IT (CNIM Group) announces the release of the new version of MediaSpeech®, its multilingual speech recognition solution that converts audio tracks to searchable text transcripts, enabling audio and video sources, to be indexed searched and analysed. MediaSpeech® now also comes in a live version for real-time audio streams, paving the way for new interactive and augmented communications applications.
Thanks to deep neural networks commonly used in Artificial Intelligence systems, MediaSpeech® creates an extremely fine model of the acoustic space which is robust with different speakers and acoustic conditions, so offering even faster and more accurate transcription.Bertin IT introduces MediaSpeech v6, its latest multilingual speech recognition solutionTweet this
Features:
- Speech recognition with each word being transcribed within a millisecond and assigned a recognition confidence score.
- Automatic detection of spoken language (LID).
- Automatic segmentation speaking slots and speakers with gender recognition.
- Identification of the speaker from a biometric database.
- Automatic and semi-automatic adaptation of vocabularies and domains.
MediaSpeech® has several variations: deployed on site or in SaaS mode, hosted on Bertin IT’s cloud, MediaSpeech® Factory can handle large volumes of files with guaranteed performance levels; a new version MediaSpeech® Live is able to transcribe audio streams on the fly, opening the door to innovative real-time applications - voice chatbots, call-bots, enhanced call centres (the enhanced call centre concept involves the provision of assistance to the adviser during the call so streamlining and improving the quality of the dialogue.).
Among the main improvements in the new version of MediaSpeech®:
- MediaSpeech® Live version for processing audio streams in real time.
- New neural models make transcription two to three times faster and more accurate.
- “Full" neuronal transition of all speech processing modules: speech detection (VAD) and speaker segmentation (Diarization) for even greater accuracy.
- Easy installation process, stronger security and new interfaces.
- A fully neuronal language identification module (LID) with increased accuracy, even for relatively short sections of speech.