Information Theoretic Learning
- PI: Jose Principe
- Source of Funding: NSF, ONR
Information Theoretic Learning (ITL) was initiated in the late 90’s at CNEL and has been a center piece of the research effort. ITL uses descriptors from information theory (entropy and divergences) estimated directly from the data to substitute the conventional statistical descriptors of variance and covariance. ITL can be used in the adaptation of linear or nonlinear filters and also in unsupervised and supervised machine learning applications. See the ITL resource center for tutorials, examples and Matlab code.
Correntropy Dependence Measure
- Ph.D. students: Sohan Seth, JianWu Xu
Correntropy was defined as a generalization of correlation of random processes. The name indicates that the mean value over lags provides the information potential which is the argument of the logarithm of quadratic Renyi’s entropy. As can be expected correntropy includes not only second order but also higher order moment information of the random variables. Using correntropy we can derive centered correntropy and correntropy coefficient which are equivalent to the covariance and correlation coefficient respectively. A novel parametric correntropy is defined as the correntropy between a shifted and a scaled random variable. The supremum of the parametric correntropy coefficient over all possible shifts and scales gives rise to a new dependence measure that has very interesting properties. It leads to new tests of independence and is also able to quantify the dependence among random variables.
Rao M., Xu J., Seth S., Chen Y., Tagare M., Principe J., Correntropy Dependence Measure, submitted to Metrika, 2009.
Nonlinearity tests based on Correntropy
- Ph.D. student: Aysegul Gunduz
The inclusion of second and higher order information in correntropy makes it particularly useful in distinguishing between linear and nonlinear signal sources. We have created a simple procedure based on the correntropy spectral density (CSD) and surrogates for nonlinear tests. If an examined time series was created by linear dynamics, the underlying distribution of its CSD and that of its surrogates should be the same. On the other hand, if the two underlying distributions are different, we deduce that the time series contains nonlinear structures not contained in its surrogates. Normalizing the CSD by its total value converts correntropy per frequency into a pdf and allows for the use of the two-sample Kolmogorov-Smirnoff goodness-of-fit test.
Aysegul Gunduz, Jose Principe, Correntropy as a Novel Measure for Nonlinearity Tests (submitted)
Pitch Detection Based on Correntropy
- Ph.D. Student: JianWu Xu
Another very interesting property of correntropy is its higher temporal definition for similarity because of the higher order moment information, which is controlled by the kernel size. We were able to show that correntropy can be used with advantage in pitch detection algorithms based on the cochlea filters and correlogram, or in general in any applications that requires a time bandwidth product better than the conventional nonparametric spectral estimation.
Jianwu Xu and Jose C. Principe, A Novel Pitch Determination Algorithm Based on Generalized Correlation Function (accepted)
Nonlinear Granger Causality based on correntropy
- Ph.D. student: Il Park (a.k.a. Memming)
Correntropy defines an RKHS (reproducing kernel Hilbert space) nonlinearly related to the data space. Therefore, Wiener filters in the correntropy RKHS are nonlinear filters in the data space. We have applied this idea to derive a nonlinear causality test based on Granger causality in the correntropy RKHS. Preliminary results show that for certain time series, the method outperforms the linear counterpart.
Compressive sampling based on correntropy
- Ph.D. students: Sohan Seth, Weifeng Liu
Correntropy induces a nonlinear metric in the sample space that is very interesting since it changes from an L2 metric to L1 and finally to L0 depending upon the distance between the samples. This flexibility can be utilized to seek an approximation for the L0 norm solution required in compressive sampling.
Seth S., Principe J., Compressed Signal Reconstruction Using the Correntropy Induced metric, in Proc. ICASSP 2008, Las Vegas.
ITL Feature Extraction for mine recognition
- Ph.D. student: Erion Hasanbelliu
ITL descriptors of divergence (Euclidean and Cauchy Schwarz distances) are particularly useful in estimation distances in probability spaces. This project exploits this advantage to extract features in sonar (mine recognition). Here two methods are being compared. One uses the natural metric of the ITL RKHS to create infinite capacity associative memories. The other uses the Cauchy Schwarz quadratic mutual information to project the image snippets into a low dimensional space where the projections are used as features from each image.
The Principle of Relevant Entropy
- Ph.D. student: Sudhir Rao
We have created a multi-objective cost function for unsupervised learning that is able to yield as special cases clustering, principal curves and vector quantization as particular values of a parameter. One of the terms is the entropy of the data, and the other is the Cauchy-Schwarz distance between the original data set and the processed data. As a weighted combination of these two terms, the cost is completely specified by just two parameters; one defining the goal or task of learning and the other the resolution of the analysis. This new framework has some striking similarities with the popular bottleneck method. Further, there is a fast fixed point algorithm that is able to implement all of these cases, thus avoiding the issues of step-size all together.
Sudhir Rao, Allan de Medeiros Martins, Weifeng Liu, Jose C. Principe, Information Theoretic Mean Shift Algorithm ", Intl. Work. on Neural Networks for Signal Processing", Maynooth, Ireland, pp. -, 9 2006