[ Home ] [ Recognition ] [ Training ] [ Commands[ Technology ] [ Contact ]

WavFrag Voice Recognition, Technology


    The technology behind the scenes. WaveFrag is a new development in recognition technology, and it is intended to cut a new path in this sector of research. WaveFrag represents a new state in voice recognition, with simplicity, and usability. It works on a completely new method, and its accuracy rivals competitors that have been developing these kinds of programs for a decade or more. Bear in mind WaveFrag engine is a new development, and as such it carries a huge potential.

    WaveFrag is very efficient when it comes to computing power. It requires a small fraction of the traditional voice recognition's computing power. We installed it on an old Dell computer and it worked flawlessly.  The processor on that computer was far less than one gigahertz. This points us to the observation that WaveFrag voice recognition will work on embedded platforms.

   WaveFrag is so simple, that it can be trained with one or two pronunciation of the word.  Yet it is powerful enough to recognize the difference between the word 'tree' and 'three'. WaveFrag is speaker independent, however, speech patterns that are extremely deviating from the common, need to be trained.
   
  
     The technology itself represents a breakthrough. Below is a short description of how we achieved such high accuracy, consuming low computing power.

    The incoming voice is digitized by the computer's sound card, and is presented to the voice recognition engine. The engine will execute all of the functions needed to turn this into text. The text is then presented to the application, which interprets it according to the user's wish. The voice-recognition functions are partitioned into several sub functions. First the voice is treated with an effective normalization process. The sound levels are normalized to median level for the engine to process.  This is why WaveFrag is relatively insensitive to the amplitude of the sound.

    In the second stage, this normalized sound is converted into a spectral image. The spectral image is then normalized again , and  then examined for dominant spectra. This dominant spectra is captured, and fed to a comparison engine. The comparison engine examines this spectral image and executes a 'wobble' search function.  The wobble search finds the closest match in the WaveFrag dictionary,  and delivers a match to the currently examined spectral image. Naturally, one can adjust the level stricktness for the required match, which allows the speech recognition engine to be very associative, or very strict. This allows WaveFrag to be deployed in diverse situations.  With the recognition criteria adjustable, WaveFrag can serve a multitude of functions from entertainment grade went easy recognition is an advantage, to industrial grade accurate recognition is essential. (little or no false positives)

  In tests, we installed a WebCam on top of the computer monitor screen, 6 feet away. The recognition engine have recognized all of the words we pronounced. A test, which no other voice recognition engine passes.
 

   
    Speaker independence, and microphone independence. WaveFrag achieves its Speaker independence by capturing the most common speech patterns (or pronunciation patterns for that fact) and makes its comparison accordingly. WaveFrag achieves its microphone independence by virtue of the spectral analysis and the dominant spectrum filtering. The dominant spectrum of the speech will stay consistent, with little regard to the microphone's frequency characteristics or the sound processing subsystem's artifact introduction. (aka. distortion and clipping)
 
    I magine a healthcare workstation, were the medical professional enters the patient's room, and the professional pronounces the words 'Display Medication' or 'Show Status' and the workstation responds with wit the appropriate screens, presenting the desired information. With WaveFrag now this is a practical possibility.
 
   From the higher level point of view, the WavFrag system works by decomposing elements of the speech.  The decomposition provides  basis for the recognition, and recognition is achieved by a routine called  'doughnut' , which - following the traditional wisdom - suggests to 'Look at the doughnut , not the doughnut hole'  ...  a way to analyze dominant presence of the parts of the speech. 

  One of the outstanding features of the algorithm is 'push-ability'. When the user pronounces a word more distinctly, the algorithm responds with better recognition, as opposed to the traditional algorithms, which degrade when someone is enunciating harder (or 'pushing it'). 


Copyright by (C) 2009,   Peter Glen,  (C) 2010 RobotMonkeySoftware LLC.