¡¡

ICML 2010 Tutorial on Kernel Adaptive Filtering

Jose C. Principe and Weifeng Liu
¡¡

Topic overview

This tutorial introduces a family of sequential active learning algorithms in reproducing kernel Hilbert spaces (RKHS), collectively called ¡°kernel adaptive filters¡±.  A filter is the equivalent of a regressor for time series. Kernel adaptive filters are linear adaptive filters in RKHS and correspond to universal learning systems in the input space for a majority of kernels. Moreover, by exploiting the structure of the RKHS, inner product operations with infinite dimensional vectors can be easily computed in the input space by kernel evaluations. Last but not the least, these nonlinear adaptive filters do not suffer from the local minima problem unlike traditional neural networks. These three characteristics make kernel adaptive filters an intriguing and potentially very useful class of nonlinear adaptive filters and learning machines.

The appeal of Kernel Adaptive Filters is that they have linear computational complexity when compared with the close form solutions of kernel machines and some of the algorithms are automatically regularized, i.e. they do not need explicit regularization. Therefore they can be very useful when the applications involve very large databases. The performance penalty paid is that they work in the neighborhood of the optimal solution. Moreover, we will show how to construct a state model in RKHS which can enhance many of the current RKHS algorithms.

We foresee a large impact due to the recent interest in sequential, active kernel learning algorithms and the need for nonlinear adaptive algorithms in advanced signal processing, streaming databases and various machine learning tasks. Sequential learning algorithms are a fundamental tool in adaptive signal processing and intelligent learning systems because they embody an efficient compromise between algorithmic simplicity, robustness and fast implementations. In addition, by defining instantaneous information measure on observations, kernel adaptive filters are able to ¡°actively¡± select training data in online learning scenarios. This active learning mechanism provides a principled framework for knowledge discovery, redundant removal and abnormality detection.

Target audience

The topic interests a great spectrum of researchers. The first group is composed by professionals and graduate students interested in nonlinear adaptive systems for on-line applications (streaming in the database world), i.e. applications where the data stream arrives one sample at a time and incremental optimal solutions are desirable. The second area where this class of algorithms is becoming important is knowledge discovery, redundant removal and abnormality detection. Researchers who are interested in sequential, active learning are also expected to attend. Another important domain is large scale machine learning problems where data sets are so large that they cannot fit the computer memory, or the complexity of the calculations is beyond the reach of the computer. In such cases data must also be processed through sampling, and optimal solutions sought incrementally. Overall, we are targeting a wide spectrum of professionals, researcher, and graduate students.

No prior knowledge is required beyond knowledge of reproducing kernel Hilbert spaces, simple optimization techniques and basic information theory concepts. The audience will learn through the tutorial on how the stochastic online algorithms including least-mean-square, affine-projection, recursive least-squares and extended recursive least-squares are constructed in RKHS, their interesting properties and applications. Additionally, various information measures will be discussed such that these sequential learners become active learners too. During the course, plentiful examples will be presented and discussed including classification, system identification, system tracking, short-term prediction and long-term forecasting. We expect around 50 participants.

Content details

Part 1: Introduction to linear adaptive filters (20min, by Dr. Principe)

This provides the basic knowledge to understand the linear adaptive filters including least-mean-square algorithm (LMS), affine-projection algorithms (APA) and recursive least-squares (RLS).

Part 2: The kernel least-mean-square algorithm and wellposedness (40min, by Dr. Principe)

The theory of RKHS will be briefly presented with emphasis on the kernel definition and general properties. From these basic concepts, the simplest of the online learning algorithm will be mapped into the RKHS (the Kernel LMS), showing how the nonlinear mapping is incrementally constructed during adaptation. Then the important issue of regularizing the solution of steep descent will be covered and a mathematical result that shows that the KLMS is well posed in the sense of Hadamard will be presented. This means that KLMS does not need regularization, which simplifies even further the implementation.

Part 3: Kernel affine projection algorithms (30min, by Dr. Principe)

Part 3 will introduce a taxonomy of learning algorithms in RKHS that is called the Kernel Affine Projection Algorithms (KAPA). The KAPA class subsumes many known adaptive algorithms in kernel spaces and here we complete the family by presenting the KLMS, the KLMS Newton, the normalized KLMS and their regularized versions. Examples of application will be provided for each and comparisons drawn.

Part 4: Kernel recursive least squares and extended KRLS (45min, by Dr. Liu)

Part 4 will be dedicated to the kernelization of RLS (KRLS) and the exponentially-weighted KRLS. From these basic algorithms, the extended KRLS, which uses a state space formulation, will be explained. As far as we know, this is the first time a state space model has been kernelized.  Examples of tracking will be provided for each and comparisons drawn.

Part 5: Active learning and topology growth control (45min, by Dr. Liu)

Part 5 will address the principal bottleneck of this class of on-line kernel algorithms, which is related to its growing structure with each new sample. Intuitively, one can expect that after processing sufficient samples from the same source, there is little need to grow the structure of the filters more, because of redundancy. The issue is how to mathematically establish the framework to test if a given sample is needed or not to improve performance. Defining the instantaneous information contained on a data sample, it is possible to estimate it directly from data using Gaussian Process theory. This criterion allows us to discard or include new exemplars in the filter structures systematically and curb the filter growth. This information criterion provides a unifying framework for knowledge discovery, abundant removal and abnormality detection.

There are plenty of figures and plots. A preliminary draft of the slides is here.

Organizers

Professor Jos¨¦ Pr¨ªncipe

principe@cnel.ufl.edu
University of Florida, Gainesville, USA

http://www.cnel.ufl.edu/principe/principe.html

Jose C. Principe (M¡¯83-SM¡¯90-F¡¯00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs) modeling. He is BellSouth Professor and the Founder and Director of the Computational NeuroEngineering Laboratory (CNEL). His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab has been studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information). Dr. Principe is an IEEE Fellow. He was the past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, Past-President of the International Neural Network Society, and Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering. He is a member of the Advisory Board of the University of Florida Brain Institute. Dr. Principe has more than 400 publications. He directed 62 Ph.D. dissertations and 65 Master theses. He wrote an interactive electronic book entitled ¡°Neural and Adaptive Systems: Fundamentals Through Simulation¡± published by John Wiley and Sons and more recently co-authored two book entitled ¡°Brain Machine Interface Engineering¡± and ¡°Kernel Adaptive Filtering: A Comprehensive Introduction¡±.

Dr. Weifeng Liu

weifeng@ieee.org
Amazon.com Inc. Seattle, USA

http://www.cnel.ufl.edu/~weifeng

Weifeng (Aaron) Liu is a research and development engineer of the forecasting group at Amazon.com Inc. where he implements large-scale data mining systems to predict custom demands. His research areas include signal processing, adaptive filtering and machine learning. He is a member of IEEE SPS machine learning for signal processing technical committee. Dr. Liu has published 5 book chapters, 9 peer-reviewed journal papers and 10 conference papers. Recently, he authored a book entitled ¡°Kernel Adaptive Filtering: A Comprehensive Introduction¡± published by John Wiley and Sons.

References

1.      Weifeng Liu, J. C. Principe, Simon Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction, John Wiley, 2010

2.      Weifeng Liu, Il Park, J. C. Principe, ¡°An information theoretic approach of designing sparse kernel adaptive filters,¡± IEEE Transactions on Neural Networks, Volume 20, Issue 12, Pages 1950-1961, 2009

3.     Weifeng Liu, Il Park, Yiwen Wang, J. Principe, ¡°Extended Kernel Recursive Least Squares Algorithm,¡± IEEE Transactions on Signal Processing, Volume 57, Issue 10, 2009

4.     P. Pokharel, Weifeng Liu, J. Principe, ¡°Kernel Least Mean Square Algorithm with Constrained Growth,¡± Signal Processing, Volume 89, Issue 3, March 2009

5.     Weifeng Liu and J. Principe, ¡°Kernel Affine Projection Algorithms,¡± EURASIP Journal on Advances in Signal Processing, vol. 2008, Article ID 784292, 12 pages, 2008. doi:10.1155/2008/784292

6.     Weifeng Liu, P. P. Pokharel, J. C. Principe, ¡°Kernel Least Mean Square Algorithm,¡± IEEE Transactions on Signal Processing, Volume 56, Issue 2, 2008

7.     S. Van Vaerenbergh, I. Santamaria, Weifeng Liu, J. Principe, "Fixed-budget kernel recursive least-squares," International Conference on Acoustics, Speech, and Signal Processing, 2010

8.     Weifeng Liu, J. Principe, "Extended recursive least squares algorithm in RKHS," 1st IAPR Workshop on Cognitive Information Processing, 2008

9.     Weifeng Liu, J. Principe, "The wellposedness analysis of kernel adaline," Intl. Joint Conf. on Neural Networks, 2008

10.   Weifeng Liu, P. Pokharel, J. Principe, "Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning," IEEE Int. Workshop on Machine Learning for Signal Processing, 2007

11.   P. P. Pokharel, Weifeng Liu, J. C. Principe, ¡°Kernel LMS,¡± International Conference on Acoustics, Speech, and Signal Processing, 2007