EEL6586: HW#4

Next: About this document ... Up: EEL6586: Homework Assignments Previous: EEL6586: HW#3

EEL6586: HW#4

Due Wednesday, April 18, 2001 in class. Late homework will lose $e^{\char93 ~of~days~late} -1$ percent. See http://www.cnel.ufl.edu/analog/harris/latepoints.html for penalty.

PART A: Noncomputer Problems

A1

Assuming that

$\begin{displaymath}H(z)=\sum_{n=0}^\infty h(n)z^{-n} = \frac{G}{1-\sum_{k=0}^pa(k)z^{-k}}\end{displaymath}$

Prove that the complex cepstrum $\hat{h}(n)$ can be derived from the linear prediction coefficients a(k) using the following relation:

$\begin{displaymath}\hat{h}(n)=a(n) + \sum_{k=1}^{n-1}(k/n) \hat{h}(k)a(n-k) \end{displaymath}$

for $n \ge 1$ .

A2

As discussed in class, Euclidean distance in complex cepstral space can be related to a RMS log spectral distance measure. Assuming that

$\begin{displaymath}log S(\omega) = \sum_{n=-\infty}^{n=+\infty}c_n e^{-jn\omega} \end{displaymath}$

where $S(\omega)$ is the power spectrum (magnitude-squared Fourier transform), prove the following:

$\begin{displaymath}\sum_{n=-\infty}^{n=+\infty} (c_n - c'_n)^2 = \frac{1}{2 \pi} \int \vert\log( S(\omega))-\log(S'( \omega))\vert^2 d \omega\end{displaymath}$

where $S(\omega)$ and $S'(\omega)$ are the power spectra for two different signals.

A3

Describe the advantages and disadvantages of HMM techniques as compared to DTW techniques for speech recognition.

PART B: Automatic Phoneme Recognition in Matlab

66 utterances of nine different vowel phonemes (iy ih ey eh ae aa ow ax er) will be used to build a phoneme recognition algorithm (22 speakers, 3 utterances each). 21 utterances of each phoneme will be used to test the classification rate of your classifier (7 speakers, 3 utterances). No speaker is in both the test and training set (Why?). You should download the following files from the course website:

http://www.cnel.ufl.edu/hybrid/EEL6586/courses/hw4Readme.txt contains a description of all the files to download
http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw4Data.mat contains the matlab binary data file (about 3 Megabytes uncompressed). Read the readme file for a description of the variable formats.
http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw4Demo.m contains a Matlab program that demonstrates the use of hw4Data.mat. The formants F1 and F2 are extracted from each utterance and plotted. Notice the high degree of overlap in this feature space between the nine classes (vowels).
http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw4Baseline.m contains Matlab code that classifies the nine vowel phonemes of HW4 using F1 and F2 and a 1-NN classifier. This feature set and classifier gives an overall accuracy rate of 27.0% Your program should surpass these results by a significant margin.

All of the files can be efficiently downloaded in a compressed zip file at
http://www.cnel.ufl.edu/hybrid/courses/EEL6586/hw4.zip

B1: Choose a robust feature extraction technique that you think will provide best results. You are free to look at several different types of feature sets or to invent your own but do whatever you can to improve the recognition accuracy (without using test data during training). Explain your choice of feature set.
B2: Use a classification algorithm (for example, the nearest neighbor code in hw4Baseline.m is a good choice) to classify the test data. What is your final accuracy rate? Which two phonemes are most likely to be confused with one another?

As usual, hand in all of your code. 10 Bonus points will be awarded to the person with the best accuracy results.

Next: About this document ... Up: EEL6586: Homework Assignments Previous: EEL6586: HW#3

Dr John Harris
2001-04-05