Next: About this document
Up: EEL6825: Pattern Recognition and
Previous: EEL6825: HW#5
Final report due December 10, 1997 at 3pm.
Your final project consists of a significant portion of the grade in this
class. Everyone must have a tentative project idea by Friday, October
31. Important dates are as follows:
- By Friday October 31, 5pm. Email me a description of the nature
of your project (at least one paragraph in length).
- Each Friday until the final day of class, each of you should email
me a short description of your progress for the week. (You do not need to
mail anything on Thanksgiving weekend).
- Exam 2 will be on November 24, 1997. It will cover nonparametric
methods, neural networks and PCA techniques.
- Oral Presentations: I would like each of you to give a short 5 minute
presentation on your accomplishments. We will have presentations the last
three days of class: December 5, 8, and 10.
- Final project reports are due on the last day of class, December 10,
1997.
Your final grade for the project will be based on the on-time completion and
quality of each of the above items.
Your final report must include the following topics
- Linear Classifier results.
- Bayes Classifier results
- k-NN Classifier results.
- Dimensionality reduction using KL Transform or other technique.
- A twist (something new and different) Some examples are given below.
- An interpretation of the results. For example, what do the results
tell you about the data or the classifiers that you are using.
Your final project report
should be written as if it were to be
submitted to a conference and therefore should contain the following
components:
- A short literature review about the topic, you should include at least
one reference to a paper you have read (not a textbook).
- A concise description of the problem.
- A detailed description of your solution to the problem.
- Matlab simulation results.
- A discussion of the significance of these results.
- The appendix should contain complete MATLAB codes, messy derivations
and any other information too detailed to keep in the main body.
You are strongly encouraged to come up with your own idea for a project
based on your own experience, however some suggestions include:
- Study the change in error rate with respect to.
- the amount of data
- the dimension
- the number of classes
- to k in k-NN classification
- to v in Parzen windows-based classification
These experiments might be best done with synthetic data. - Study some other classifiers that we haven't talked about in class and
compare them to the conventional methods we have discussed. These methods
might include
piece-wise classifiers
or neural networks (if you have already taken this course)
- Choose a novel domain that requires some special consideration or
feature extraction. You may find some interesting data though the internet.
For example, take a look at some of the benchmark data sets given in
- Speech and character recognition are both very challenging problems but
would both make excellent projects. We just have to make sure that you do
something more involved (or different) than what we will be doing in the
homework. For both of these problems, feature extraction is the key step.
Some example projects worked on in past years include:
- Sonar, Mines vs. Rocks: This is the data set used by Gorman and
Sejnowski in their study of the classification of sonar signals using a
neural network. The task is to train a network to discriminate between sonar
signals bounced off a metal cylinder and those bounced of a roughly
cylindrical rock.
- Vowel Recognition: Speaker independent recognition of the eleven
steady state vowels of British English using a specified training set of lpc
derived log area ratios.
The following datasets are available in the UCI database and some have been
used in past years for projects:
- Wisconsin Breast cancer databases: Currently contains 699 instances,
2 classes (malignant and benign), 9 integer-valued attributes
- Credit Screening Database: a good mix of attributes - continuous,
nominal with small numbers, of values, and nominal with larger numbers of
values, 690 instances, 15 attributes some with missing values.
- Echocardiogram database:
Documentation: sufficient, 13 numeric-valued attributes,
Binary classification: patient either alive or dead after survival period
- Glass Identification database:
Documentation: completed 6 types of glass Defined in terms of their oxide
content (i.e. Na, Fe, K, etc) All attributes are numeric-valued
- David Slate's letter recognition database (real):
20,000 instances (712565 bytes) (.Z available),
17 attributes: 1 class (letter category) and 16 numeric (integer),
No missing attribute values.
- Mushrooms in terms of their physical characteristics and classified
as poisonous or edible (Audubon Society Field Guide):
Documentation: complete, but missing statistical information,
All attributes are nominal-valued,
Large database: 8124 instances (2480 missing values for attribute #12)
- Congressional voting records classified into Republican or Democrat (1984
United Stated Congressional Voting Records)
Documentation: completed,
All attributes are Boolean valued; plenty of missing values; 2 classes
- Wine Recognition database:
Using chemical analysis determine the origin of wines,
13 attributes (all continuous), 3 classes, no missing values,
178 instances.
Next: About this document
Up: EEL6825: Pattern Recognition and
Previous: EEL6825: HW#5
Dr John Harris
Mon Nov 10 01:03:10 EST 1997