Home Products Demo Company Clients Contacts

SeeStorm Phoneme Recognition & LipSync Technology

SeeStorm Phoneme Recognition technology analyzes the voice signal and identifies human speech phonemes. The reliability of recognition is up to 95% in automatic text-dependent and independent modes without associating of specific language of the input speech. Phoneme Recognition can run in real-time or process a pre-recorded voice message.

Phoneme Recognition is based on the Artificial Neural Networks technology (ANN), which is used to classify the acoustic characteristics (coefficients obtained from analysis) in order to recognize phonemes. Phonemes are subdivided into several groups of the similar basic articulation type for the purpose of their visualization. In addition, energy variation is used to generate satisfactory co-articulation for the speech-based animation.

Phoneme visualization is based on Face Mimic Modeling technology, which uses 3D Model frame sequences to display lip motions that correspond to the speech data, and the accompanying mimics (movements of the head, eyes, and eyebrows). All facial motions are synchronized with the fluctuation, timing and nuances of the voice.

3D avatar can be also animated by text input, if some Text-To-Speech software is used to convert text into voice beforehand. Smile signs ('emoticons') in a text are recognized as emotions, and avatars can express them. The reliability of Phoneme Recognition from TTS input is 100% without dependence on the TTS engine type or sound device.

Phoneme Recognition combined with phoneme visualization constitutes Lips Synchronization (LipSync) technology. LipSync speech-to-motion engine provides real-time automatic animation of 3D avatars (characters) by human speech: avatars' lips move in sync with the speech, and the other mimics accompany the lips. So avatars (talking characters) behave naturally and impressively visualize the voice communication.

SeeStorm LipSync

SeeStorm Technologies