datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com   UBM Tech
UBM Tech

Unlock Siri-like apps with 3-D voice processing

Lior Blanka, DSP Group -February 04, 2013

Voice quality is a hot issue due to the recent rise of voice control interfaces for tablets, computers, smart TVs and other consumer electronic devices.  Without intelligible speech, automatic voice recognition can't function properly and to be consider as reliable input device.  This problem is compounded by noisy environments, which can severely degrade the quality of speech to the point where voice control is totally inoperable.

Traditional noise cancellation suffers from tradeoffs between the degree of noise reduction and voice quality: the higher the noise reduction levels, the greater the potential for voice distortion.  Attempting to minimize the tradeoffs, engineers have developed Noise reduction algorithms to reduce the amount of noise which perform well mainly in stationary noise and poor performance in non-stationary noise such as street noise and similar other noises.

Noise cancellation technique took a leap forward with the introduction of a second microphone in smart phones, enabling both microphones to operate in similar manner to the human auditory system. However, this capability does not provide sufficient noise cancellation to eliminate all background noise for voice calls or voice control, while driving or riding on public transportation, or even at home when, for instance, music is turned up loud.

Adding an Sensor for Advanced Noise Cancellation

Advanced noise cancellation technology uses an additional sensor in addition to the standard two audio microphones, and then applies a 3D-Vocal algorithm to perform multiple voice processing tasks including echo and background noise cancellation, loudness equalization and general voice enhancement.  Removing background noise significantly improves the accuracy rate of ASR, (Automatic Speech Recognition) and voice-call applications for smart phones, tablets and other mobile devices.

An example of how the advanced noise cancellation affects the noisy speech is shown in Figure 1.0 below. The upper waveform illustrates the noisy speech, which is the superposition of speech and ambient noise (S+N), while the lower waveform shows the clean resulting speech signal after 3D voice processing.

 

Figure 2.0 shows a spectrogram, where the upper graph presents the spectrogram of the noisy speech S+N; the lower spectrogram shows the resulting speech signal after 3D voice processing


Using the expanded set of data provided by the sensor and the two mics, the 3D-Vocal algorithm extracts features that characterize the speech source and distinguishes between the sound components that belong to required speech vs. ambient noise. The block diagram in Figure 3.0 below shows the audio path for the advanced noise cancellation technique.

 


The 3D voice processing diagram components are defined as follows:

3D-Vocal (Spectro-Temporal Analysis): Receives all signals from the microphone array and from the VSensor, and performs special spectro-temporal processing on the combined information.  Some correlated patterns in the 3D-Vocal data are associated with ambient noise, while others are identified as the user’s voice. The 3D-Vocal spectro-temporal process separates the user’s voice from the predicted ambient noise and produces some reference information for the voice/noise Feature Extraction block.

 

Feature Extraction: Contains voice/noise data that is fed to the other blocks. The extracted features contain spectro-temporal, real-time information about the user’s speech and ambient noise. This information can be used to filter out ambient noise from the user’s speech, enhance echo cancellation performance, and more.

Ambient Noise Cancellation:
Cancels various types of stationary and non-stationary, coherent and non-coherent ambient noise. The ambient noise cancellation algorithm uses feature extraction information and output of the 3D-Vocal block.

Equalization:
Equalizes the spectral distribution of the received signal to match the requirements of the ASR process or the Voice call requirements.

 


Loading comments...

Write a Comment

To comment please Log In