Next: EEL6825: HW#5 Up: EEL6825: Pattern Recognition Fall Previous: EEL6825: HW#3

EEL6825: HW#4

Due Thursday, November 7, 1996 at 3pm. As usual hand in all code that you write

This assignment is a simplified introduction to speech recognition.

Part A: Getting the data

A1

Record yourself and at least two others speaking five vowel sounds, (short a, long e, long i, long o, long u). Break your data into a test and training set. If you would rather, you may use only one speaker but record at least three different times-this makes the classification much easier but less interesting. You can record using any system you want-feel free to team up with other people to record and share the data. Read the note about the shareware program called goldwave in the next paragraph. Note, if you absolutely, absolutely are not able to record data :( ftp the data in http://www.cnel.ufl.edu/analog/courses/EEL6825. There are 3 data sets there from three speakers. The first letter indicates the speaker and the second indicates the vowel. So for example, ``me.mat'' indicates speaker M and vowel ``e''. Speakers J and S are male and M is female. Everyone is strongly encouraged to record some data and discover the issues involved. You can combine data you recorded with the the ftp-ed data if you like. Fully explain how you obtained your data (what programs, sampling rates, who the speakers were, etc.)

Goldwave is a shareware program that you can get from http://www.shareware.com and search for goldwave. http://www.cnel.ufl.edu/analog/courses/EEL6825 also contains a copy (gwave322.zip). You have to unzip this file and store it in a directory on your PC. To record, select ``new'' and choose 16-bit, 22050 Hz sampling, and choose a 1 second recording time. Press the red record button and speak into the microphone. Once you recorded one vowel in the one second interval, save it as a MATLAB .mat file.

A2

Plot a zoom up of some of the vowels in the training set showing only 5 or so periods of each signal. An example is shown below (left) without proper numbering or labeling on the axes.

tex2html_wrap289 tex2html_wrap291

A3

Perform an fft of each signal. Plot the magnitude of the fft of each vowel being sure to zoom up on the frequencies of interest and showing the real frequencies on the x-axis. An example is above (right), again without proper numbering and labeling of the axes

Part B: Creating the feature vector

B1: Now we are ready to build your feature vector. Using the results from the fft from Part A, choose carefully the frequency range that you want to include in your feature vector. Also choose carefully the sampling resolution that you require. For example you could subsample the vector by averaging the magnitudes for each group of 10 or some other number of frequencies. Normalize each resulting vector by dividing by its maximum value. Create these feature vectors for each pattern you have in the test and training sets.
B2: Build a classifier, probably 1-NN is the simplest. Be careful not to use the test data in designing your classifier. What is the performance of your classifier on the test data?

Part C: K-L transform

C1

Run a K-L transform on all of the training data.

C2

Choose a reduced number of features from the K-L transform and explain how you decide which features to keep.

C3

What is the performance on the new classifier on the test data? Remember you must perform exactly the identical transformation on your test data as your training data (i.e., don't take the K-L transform of your test data!)

Note: each of you must use the K-L or other dimensionality reduction program for your final project. If you are clever, you can write your code so you can use it for both purposes without modification.

Part C: Short questions

D1: Explain how the classification in this assignment is achieved independent of the intensity of the sounds.
D2: Discuss an alternative feature vector you could have used for this assignment, i.e. not using the Fourier transform. Would you expect this feature to work better?
D3: The density function of a two-dimensional random vector x consists of four impulses at (0,3) (0,1) (1,0) and (3,0) with probability of 1/4 for each. Find the K-L expansion. Compute the mean-square error when one feature is eliminate. Compute the contribution of each point to the mean-square error.
D4: Suppose a student is given data that consists of many 2-D samples of the 1-D curve described by: where . Why can't the standard K-L transform accurately represent this data in one dimension? Sketch the likely result of using the K-L transform to reduce the dimension for this problem.

Next: EEL6825: HW#5 Up: EEL6825: Pattern Recognition Fall Previous: EEL6825: HW#3

John Harris
Tue Nov 19 07:44:32 EST 1996