next up previous
Next: About this document Up: EEL6825: Pattern Recognition Fall Previous: EEL6825: HW#4

EEL6825: HW#5

Due Thursday, November 26, 1996 at 3pm. This assignment will provide a rudimentary introduction to character recognition. We will study fixed font, capital letter recogntion with no punctuation.

  1. Character Recognition
    1. Scanning the data. Scan the last page of this assignment into a computer and create an image that you can read into MATLAB or some other program of your choice. You are free to work with other people in scanning in the document. If you are absolutely not able to scan in data, you may ftp a binary file from jupiter.cnel.ufl.edu in the hw5 directory. The file is named letters.gif and it is 351 rows by 440 columns image. The scanned image is show below (it actually looks better than this postscript rendition).

      tex2html_wrap354

      This code that was used to read and display the image in MATLAB was the following:

      clear;
      [x,map]=gifread('letters.gif');
      colormap(gray(52));
      image(52-x);
      truesize;

      The following line was used to plot the cross-section cutting horizontally through the image at row 250:

       plot(x(250,:))
      The result of this cross section is shown below:

      tex2html_wrap356

    2. Thresholding: Write a program to threshold the image so that it is binary (ones and zeros).
    3. Filtering: Use a filter to get rid of the small noise specks (if any) that are obviously not letters. Hint: For example, use a median filter.
    4. Sementation and labeling: Write a program that pulls out each of the characters in the training and test sets as small subarrays. The first 26 letters you find on the page are the 26 capital letters comprising the training set. The remaining letters on the page are capital letters with no punctuation for the training set. Hint: you might consider using a connected components algorithm (described briefly in section 3.3 of Nadler & Smith)
    5. Recognition: Use the letters in the training set to decode the message in the test set. Hint: you might use either template matching or 2D FFT distances.

    If you scan your own data, you may be able to get the software to give you a binary image already or you can use ``xv'' in unix or a whole host of image viewing programs on the PC to threshold and perhaps filter the image before you load into MATLAB. The one thing you cannot do is use character recognition software to actually do the recognition.

    Explicitly state where you got your data (i.e. from the ftp site or scanning it yourself). As usual, you should hand in all of your code. As much as possible, your code should not include anything specific about this image (for example, font size, image size, thresholds, etc). Briefly describe the methodology you chose for each step and successfull each step was. You should show the individual templates you recovered for the characters as well the recognition performance.

  2. The density function for three classes consist of three impulse for each class with each impulse carrying the probability of 1/3.
    Class 1 impulses are located at (1,0) (2,0) and (1,1)
    Class 2 impulses are located at (-1,0) (0,1) and (-1,1)
    Class 3 impulses are located at (-1,-1) (0,-1) and (0,-2)
    Find the single linear feature that maximizes tex2html_wrap_inline328 .
  3. We have seen several examples of how the K-L dimension reduction technique can throw out valuable features needed for classification. This problem investigates of how optimizing the trace criterion can also throw out valuable features for classification. Three normal distributions are characterized by:

    displaymath330

    displaymath332

    displaymath334

    1. Find the single linear feature that maximizes tex2html_wrap_inline328 . (Show all of your work!)
    2. In 1-D, tex2html_wrap_inline328 reduces to tex2html_wrap_inline345 where tex2html_wrap_inline342 , tex2html_wrap_inline344 and tex2html_wrap_inline346 are the 1-D projections of tex2html_wrap_inline348 , tex2html_wrap_inline350 and tex2html_wrap_inline346 respectively. Justify your result above by computing the value of the 1-D criteria for each of the eigenvector directions.
    3. Suggest a better criterion (in one-dimension). Make sure that it chooses the correct linear feature for this problem.

The next page contains the data that you should scan into a computer for problem 1. If you want to print out a clean page that has not been xeroxed, the postscript file ``letters.ps'' is also given in the hw5 ftp directory.

 

A  B  C  D  E  F  G  H  I  J  K  L  M 
 
N  O  P  Q  R  S  T  U  V  W  X  Y  Z 


T H E   Q U I C K   B R O W N 
 
F O X   J U M P E D   O V E R 
  
T H E   L A Z Y   D O G


next up previous
Next: About this document Up: EEL6825: Pattern Recognition Fall Previous: EEL6825: HW#4

John Harris
Tue Nov 19 07:44:32 EST 1996