Next: EEL6825: Homework Assignments
Up: EEL6825: Pattern Recognition and
Previous: EEL6825: Mailing List
Final report due Wednesday, December 9 1998 at 9:35am. Late reports
will suffer the usual late fees and penalties.
Your final project consists of a significant portion of the grade in this
class. Everyone must have a project idea by Wednesday November 18.
Important dates are as follows:
- By Wednesday, November 18, 5pm. Email me a description of your proposed
project (at least one paragraph in length).
- Each Wednesday until the final day of class, each of you should email
me a description of your progress for the week.
- Oral Presentations: I would like each of you to give a short 5-10 minute
presentation on your accomplishments. We will have presentations the last
two days of class: December 4, 7, and 9. With 12 students, we will have 4
students/day. You will not be graded on how good a speaker you are, but on
the work you have done and how well you prepared for the talk. I also
expect everyone to attend class each of these three days and to let me know
in advance if you cannot attend.
- Final project reports are due on the last day of class, December 9,
1998 at 9:35am. All late penalties will apply.
Your final grade for the project will be based on the on-time completion and
quality of each of the above items.
Your final report must include the following topics
- 1.
- Linear Classifier results.
- 2.
- Bayes Classifier results
- 3.
- k-NN Classifier results.
- 4.
- Dimensionality reduction using KL Transform or other technique.
- 5.
- A twist (something new and different) Some examples are given below.
- 6.
- An interpretation of the results. For example, what do the results
tell you about the data or the classifiers that you are using.
Your final project report
should be written as if it were to be
submitted to a conference and therefore should contain the following
components:
- 1.
- A short literature review about the topic, you should include at least
one reference to a paper you have read (not a textbook).
- 2.
- A concise description of the problem.
- 3.
- A detailed description of your solution to the problem.
- 4.
- Matlab simulation results.
- 5.
- A discussion of the significance of these results.
- 6.
- The appendix should contain complete MATLAB codes, messy derivations
and any other information too detailed to keep in the main body.
You are strongly encouraged to come up with your own idea for a project
based on your own experience, however some suggestions include:
- 1.
- Study the change in error rate with respect to.
- the amount of data
- the dimension
- the number of classes
- to k in k-NN classification
- to v in Parzen windows-based classification
These experiments might be best done with synthetic data.
- 2.
- Study some other classifiers that we haven't talked about in class and
compare them to the conventional methods we have discussed. These methods
might include
piece-wise classifiers
or neural networks (if you have already taken this course)
- 3.
- Choose a novel domain that requires some special consideration or
feature extraction. You may find some interesting data though the internet.
For example, take a look at some of the benchmark data sets given in
- 4.
- Speech and character recognition are both very challenging problems but
would both make excellent projects.
For both of these problems, feature extraction is the key step.
The following datasets are available in the UCI database and some have been
used in past years for projects:
- 1.
- Wisconsin Breast cancer databases: Currently contains 699 instances,
2 classes (malignant and benign), 9 integer-valued attributes
- 2.
- Credit Screening Database: a good mix of attributes - continuous,
nominal with small numbers, of values, and nominal with larger numbers of
values, 690 instances, 15 attributes some with missing values.
- 3.
- Echocardiogram database:
Documentation: sufficient, 13 numeric-valued attributes,
Binary classification: patient either alive or dead after survival period
- 4.
- Glass Identification database:
Documentation: completed 6 types of glass Defined in terms of their oxide
content (i.e. Na, Fe, K, etc) All attributes are numeric-valued
- 5.
- David Slate's letter recognition database (real):
20,000 instances (712565 bytes) (.Z available),
17 attributes: 1 class (letter category) and 16 numeric (integer),
No missing attribute values.
- 6.
- Mushrooms in terms of their physical characteristics and classified
as poisonous or edible (Audubon Society Field Guide):
Documentation: complete, but missing statistical information,
All attributes are nominal-valued,
Large database: 8124 instances (2480 missing values for attribute #12)
- 7.
- Congressional voting records classified into Republican or Democrat (1984
United Stated Congressional Voting Records)
Documentation: completed,
All attributes are Boolean valued; plenty of missing values; 2 classes
- 8.
- Wine Recognition database:
Using chemical analysis determine the origin of wines,
13 attributes (all continuous), 3 classes, no missing values,
178 instances.
Next: EEL6825: Homework Assignments
Up: EEL6825: Pattern Recognition and
Previous: EEL6825: Mailing List
Dr John Harris
1998-12-19