ECE5525 Speech Processing
Final Presentation
Microphone Array
and Beamforming
By
Pattarapong Rojanasthien
December 8, 2008
Microphone Array and Beamforming 1. Problem Statement
The goal in this project is to study the top level of the concept of the microphone array and processing its acoustic sound output using beamforming technique to get a better quality of sound base on signal to noise ratio comparatively to the original sound from each microphone.
2. Background Knowledge
Microphone array is a set of multiple microphones grouped in certain a uniform formation, such as in line or in circle. Typically the sounds that captured by the microphone array will be sent to the computer for digital signal processing purposes.
We know that sound travels in form of wave, which means it will spread out to certain conical angle. With microphone array, we can capture the sounds by using multiple microphones. Each microphone will capture the same sounds from the same sources, but at different time (one microphone may capture the sound before others) in which we treat that difference as a time delay.
The source creates the sounds, which travel and get captured by the microphone array
The figure shows the plot of 5 channels out of the 15 channels from file an103-mtms-arr3A.adc[4]. Note that the x-axis is a sample index and y-axis is the amplitude of the signal.
The figure from above shows that the first 1000 samples of each channel (microphone). They receive the sound signal at different time. Their phases are slightly the shifted version of one another. Also notice that each channel has different amount of noise. One may have more noise than the other. The amplitude may be slightly different as well, but the general pattern (shape) the same since it comes from the same source.
These are the four first 1000 samples of the 5 signals from the previous figure.
Here are some examples of the microphone arrays
Four-element linear array design to put on top of the monitor, works with +/- 50 degree range, and covers the office [6].
Circular array microphone for the conference, works with 360 degree [6].
Microphone array for recording the symphony orchestra in France [7].
A 1020 Microphone Array at MIT [7].
To find the time delay in which signify how much time it takes one microphone to receive the sound from the same source with relatively to other microphone. First, we must set one microphone as a reference. The easiest way to find the reference microphone is finding the one that is the perpendicular point of the source and microphone array. This will make the calculation for time delay a lot easier because we can use geometry to solve it given that we know the angle , sampling frequency, speed of sound, distance between microphone N, and distance between the source and reference microphone d.
Source
d’
Distance d
Ref Mic distance N distance g Mic x
Note that the distance between another microphone(not the reference) to the source, which denote as g, can be compute with trigonometry cos = d/g => d = g cos.
For example given the following condition
-
The distance from source to Ref Mic: d = 30 in.
-
The distance from source to Mic x: g = 34.73 in.
-
The distance between mic: N = 17.5 in.
-
Sampling Frequency: fs = 22050 samples/sec.
-
Speed of sound: c = 345 m/sec.
-
Find d’ = 34.73 – 30 = 4.73.
-
Find Sample/meter => 22050/345 = 63.913 samples/m.
-
Turn into inches => 63.913 x 0.0254 = 1.62
To find time delay, we need distance d’ x = 4.73 x 1.62 = 7.7 samples, round up to be 8 samples.
Also note that in the situation where the distance from the source and the microphones is very far away compare to the distance between the microphones themselves. The calculation will be much simpler. Because the angle becomes more and more close to 90 degree, which makes the direction of sound from the source goes into each microphone in parallel. There will not even be the need to compute for the delay.
2.2. Beamforming
Beamforming is the technique that take multiples signals from the microphone array to align them up and add them together. Parts of a signal that share the same phase will add up each other while the opposite phase will cancel one another out.
In this project, we use the beamforming to arrange the signals in certain way in term of time delay that the same speech sounds from the channels are in the same phase. The speech sound then will add up and increase its energy while the noise are likely to cancel each other out due to the fact that noise is random and not necessarily created by the same source.
Beamforming can be used for many applications, including but not limited to
-
Enhance acoustic signal in known/unknown direction or position.
-
Localize the acoustic sources(s) in 2-D or 3D
-
Blindly separate different acoustic source due to their spatial separations and not their frequency characteristics.
The signals from the microphone are delayed before combine up together to make the better signal.
Sample directivity pattern for the beamforming. Note that at the 0 degree beam gives the best gain while the others get reduce at most of the frequencies.
There are two main types of beamforming one of them is fixed (data-independent) and another one is adaptive (data-dependent). The fixed version has the same parameters throughout the process while the adaptive change to suit the data input, especially to the changing in noise condition.
The fixed beamforming is simplier and easier to implement, but has limited ability to eliminate highly directive noise (it may see that noise as a signal). In this project we used the simpliest beamforming technique, which is delay & sum [3]. The output of beamforming is as equation below.
Given a set of M microphones, which has different delay Tm relative to others.
2.3. Signal to Noise Ratio
In this project, signal-to-noise ratio is used to measure the quality of the signals. The energy averaging technique will be used for voice activity detection, which will find the noise from the signal based on the average energy calculate by the small and large windows. The part where the speech starts is where the energy from small window is significantly higher. Before that, it simply is just noise.
The Average energy represent by two windows. The blue line shows the large window while the red line show the small window. The bracketed part is where the speech starts
3. Solution and Implementation
The data of the microphone array is obtained from multi-microphone data recorded by Tom Sullivan at Carnegie Mellon University. The type of data set that I randomly choose for this project is the 15-channel utterance sounds recorded in array of microphones. Note that this data is saved in .adc extension file, therefore, it needs to be imported into MATLAB indirectly. That is I need to write MATLAB code to extract the file. The basic idea is that each sample with the same index will group together from channel 1 to channel 15, then start to next adjacent index and so on.
For example, let mk[n] be the sample at index n-th for the k-th microphone. The data will arrange in form of m1[0]m2[0]…mk[0]m1[1]m2[1]…mk[1]…m1[n]m2[n]..mk[n]. So we can write the MATLAB code to extract the data form each channel as in the following lines.
m1 = ArrSamples(1:15:num)./2^15;
m2 = ArrSamples(2:15:num)./2^15;
.
.
.
m15 = ArrSamples(15:15:num)./2^15;
Since the sound is encoded in PCM 16-bit, we have to divide all elements with 215 to normalize the signal energy to fall within +/- 1 for further analysis purposes. After I extract the signals into multiple arrays of mk, I then use MATLAB to export them into wavefiles for each channel while remains its original encoding and sampling frequency (in this case it is 16-bit with 16,000 Hz).
I then use these wavefiles to perform a beamforming using a program called BeamformIt. The result is given in .sph format. Therfore, it needs to be converted into something MATLAB compatible format. In this case, I use the linux-based program called sox to convert from .sph to .wav. The beamforming result as in a wavefile is then imported into MATLAB for finding signal to noise ratio comparison with original microphone signals
The voice activity detection algorithm is roughly the following
-
Measure the mean of the energy during the last t1 seconds => E1
-
Measure the mean of the energy during the last t2 seconds => E2
-
Calculate the speech threshold T2 = E2 + Ex
-
If E1 > Ts, speech is detected. (speech onset)
-
Freeze E2 and calculate the noise threshold Tn = E2 + En
-
Measure the mean of the energy of recent t1 seconds => E1
-
IF E1 < Tn, noise is detected. (speech offset)
4. Results
The result from the beamforming satisfies the goal. That is it reduces the noise and increase signal to noise ratio, which makes the speech sound becomes clearer than the sounds that come from each microphone. The table below shows the comparison of the SNR for each channels and the beamforming result of the speech sample an102-mtms-arr4A.adc. Notice that the beamforming result gives the SNR twice better than each of the microphones in the array.
Source of Signal
|
SNR
|
Mic 1
|
21.00937389
|
Mic 2
|
22.48577961
|
Mic 3
|
22.80745788
|
Mic 4
|
25.49945084
|
Mic 5
|
23.06647821
|
Mic 6
|
26.87451164
|
Mic 7
|
24.8231068
|
Mic 8
|
21.54103218
|
Mic 9
|
20.9528169
|
Mic 10
|
23.69206855
|
Mic 11
|
24.28739145
|
Mic 12
|
23.09365172
|
Mic 13
|
20.78050523
|
Mic 14
|
20.96611645
|
Mic 15
|
20.77844073
|
Result from Beamforming
|
46.79534715
|
In theory, for every pair of microphone signals added up, there should be improvement in quality by around 6 dB. However, it does not apply in this case. The reason behind it is the noise in it. There maybe a chance the noise is not completely random, therefore, their amplitudes maybe subtract each other completely as the microphone signals are added up.
The better of observing the result is to look at the spectrogram. The two figures below shows the spectrograms of the data an102-mtms-arr4A.adc. The top one is from one of the microphone. The bottom is the result of beamforming. Note that the hot colors (red and yellow) represent high energy while the cool colors (blue and green) represent lower energy.
It is very noticeable that the top one has a lot of noise energy based on the red color that scatters everywhere on the spectrogram. The result of beamforming has shown that it reduces lots of noises, but yet not completely.
Microphone array and beamforming is an alternative for relying on the very expensive premium microphone (of course, to have multiple premium microphone would be even better). With microphone array and beamforming, the signal can be twice better than using just one microphone based on Signal-to-noise ratio.
What I find the most valuable about this project is to know the potential of the application of the microphone array. The most difficulty I have in this project is the implementation of the software, especially BeamformIt. That is because the program itself is created to run on Unix operating system. However, once the program is compiled, it is a very fast, easy, and robust program to use for beamforming.
References -
Acoustic Beamforming for Signal Enhancement, Localization, and Separation. Kung Yao. DARPA Air-Coupled Acoustic Sensors Workshop Aug 24. 1999. http://www.darpa.mil/mto/archives/workshops/sono/presentations/ucla_yao.pdf
-
Audio and Speech Processing. Geert Van Meerbergen, et al. http://homes.esat.kuleuven.be/~gvanmeer/s&a/oefenzittingen/opgave2/node3.html
-
Beamformit (Fast and Robust Acoustic Beamformer). Xavier Anguera. Nov 11 2008. http://www.icsi.berkeley.edu/~xanguera/beamformit/
-
LOUD: A 1020-Node Microphone Array and Acoustic Beamformer. Eugene Weinstein. http://www.cs.nyu.edu/~eugenew/publications/loud-slides.pdf
-
Microphone Data by Tom Sullivan at Carnegie Mellon University.
-
Microphone Array Project in Microsoft Research. http://research.microsoft.com/users/ivantash/MicrophoneArrayProject.aspx
-
Trinov SRP: Surround Microphone Array. http://www.trinnov.com/SRP.php
Share with your friends: |