Next: About this document
Up: EEL6825: Pattern Recognition Fall
Previous: EEL6825: HW#4
Due Thursday, November 26, 1996 at 3pm.
This assignment will provide a rudimentary introduction to character
recognition. We will study fixed font, capital letter recogntion with no
punctuation.
- Character Recognition
- Scanning the data. Scan the last page of this assignment into a
computer and create an image that you can read into MATLAB or some other
program of your choice. You are free to work with other people in scanning
in the document. If you are absolutely not able to scan in data, you may ftp
a binary file from
jupiter.cnel.ufl.edu
in the hw5 directory. The file is
named letters.gif and it is 351 rows by 440 columns image. The scanned
image is show below (it actually looks better than this postscript
rendition).
This code that was used
to read and display the image in MATLAB was the following:
clear;
[x,map]=gifread('letters.gif');
colormap(gray(52));
image(52-x);
truesize;
The following line was used to plot the cross-section cutting horizontally
through the image at row 250:
plot(x(250,:))
The result of this cross section is shown below:
- Thresholding: Write a program to threshold the image so that it is
binary (ones and zeros).
- Filtering: Use a filter to
get rid of the small noise specks (if any) that are obviously not
letters. Hint: For example, use a median filter.
- Sementation and labeling:
Write a program that pulls out each of the characters in the training and
test sets as small subarrays. The first 26 letters you find on the page are
the 26 capital letters comprising the training set. The remaining letters
on the page are capital letters with no punctuation for the training set.
Hint: you might consider using a connected components algorithm (described
briefly in section 3.3 of Nadler & Smith)
- Recognition: Use the letters in the training set to decode the
message in the test set. Hint: you might use either template matching or 2D
FFT distances.
If you scan your own data, you may be able to get the software to give
you a binary image already or you can use ``xv'' in unix or a whole host of
image viewing programs on the PC to threshold and perhaps filter the image
before you load into MATLAB. The one thing you cannot do is use character
recognition software to actually do the recognition.
Explicitly state where you got your data (i.e. from the ftp site or scanning
it yourself). As usual, you should hand in all of your code. As much as
possible, your code should not include anything specific about this image
(for example, font size, image size, thresholds, etc). Briefly describe the
methodology you chose for each step and successfull each step was. You
should show the individual templates you recovered for the characters as
well the recognition performance.
- The density function for three classes consist of three impulse for
each class with each impulse carrying the probability of 1/3.
Class 1 impulses are located at (1,0) (2,0) and (1,1)
Class 2 impulses are located at (-1,0) (0,1) and (-1,1)
Class 3 impulses are located at (-1,-1) (0,-1) and (0,-2)
Find the single linear feature that maximizes
. - We have seen several examples of how the K-L dimension reduction
technique
can throw out valuable features needed for classification. This problem
investigates of how optimizing the trace criterion can also
throw out valuable features for classification.
Three normal distributions are characterized by:
- Find the single linear feature that maximizes
. (Show
all of your work!) - In 1-D,
reduces to
where
,
and
are the
1-D projections of
,
and
respectively.
Justify your result above by
computing the value of the 1-D criteria for each of the eigenvector directions. - Suggest a better criterion (in one-dimension).
Make sure that it chooses the correct linear feature for this problem.
The next page contains the data that you should scan into a computer for
problem 1. If you want to print out a clean page that has not been
xeroxed, the postscript file ``letters.ps'' is also given in the hw5 ftp
directory.
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
T H E Q U I C K B R O W N
F O X J U M P E D O V E R
T H E L A Z Y D O G
Next: About this document
Up: EEL6825: Pattern Recognition Fall
Previous: EEL6825: HW#4
John Harris
Tue Nov 19 07:44:32 EST 1996