![]() [ Home ] [ Recognition ] [ Training ] [ Commands ] [ Technology ] [ Contact ] WavFrag Voice Recognition, Technology The
technology
behind the scenes. WaveFrag is a new development in recognition
technology, and it is intended to cut a new path in this sector of
research. WaveFrag represents a new state in voice recognition, with
simplicity, and usability. It works on a completely new method, and its
accuracy rivals competitors that have been developing these kinds of
programs for a decade or more. Bear in mind WaveFrag engine is a new
development, and as such it carries a huge potential.
The technology itself represents a breakthrough. Below is a short description of how we achieved such high accuracy, consuming low computing power. The incoming voice is digitized by the computer's sound card, and is presented to the voice recognition engine. The engine will execute all of the functions needed to turn this into text. The text is then presented to the application, which interprets it according to the user's wish. The voice-recognition functions are partitioned into several sub functions. First the voice is treated with an effective normalization process. The sound levels are normalized to median level for the engine to process. This is why WaveFrag is relatively insensitive to the amplitude of the sound. In the second stage, this normalized sound is converted into a spectral image. The spectral image is then normalized again , and then examined for dominant spectra. This dominant spectra is captured, and fed to a comparison engine. The comparison engine examines this spectral image and executes a 'wobble' search function. The wobble search finds the closest match in the WaveFrag dictionary, and delivers a match to the currently examined spectral image. Naturally, one can adjust the level stricktness for the required match, which allows the speech recognition engine to be very associative, or very strict. This allows WaveFrag to be deployed in diverse situations. With the recognition criteria adjustable, WaveFrag can serve a multitude of functions from entertainment grade went easy recognition is an advantage, to industrial grade accurate recognition is essential. (little or no false positives) In tests, we installed a WebCam on top of the computer monitor screen, 6 feet away. The recognition engine have recognized all of the words we pronounced. A test, which no other voice recognition engine passes.
From the higher level point of view, the WavFrag system works by decomposing elements of the speech. The decomposition provides basis for the recognition, and recognition is achieved by a routine called 'doughnut' , which - following the traditional wisdom - suggests to 'Look at the doughnut , not the doughnut hole' ... a way to analyze dominant presence of the parts of the speech. One of the outstanding features of the algorithm is 'push-ability'. When the user pronounces a word more distinctly, the algorithm responds with better recognition, as opposed to the traditional algorithms, which degrade when someone is enunciating harder (or 'pushing it'). ![]() |
||||||
| Copyright by (C)
2009, Peter Glen, (C) 2010 RobotMonkeySoftware LLC. |