Finding the Mixing Matrix

A method similar to Newton’s method is applied to the estimation of negentropy described above to find a mixed signal with minimum nongaussianity and maximum independence.

A Taylor series can be used to approximate a function as follows:

f(x₀+e) = f(x₀) + f’(x₀)e

If we are trying to find the zero of f(x) then e can be described as the step needed from x₀ to make f(x₀+e)=0.

So,

Applied iteratively, we find Newton’s method:

J(x) is the negentropy approximation. We want to find the minimum value of J(x) so we want to find a place where J’(x)=0. Hyvärinen has a complex analysis showing how Newton’s method can be used to derive the following algorithm.

A single column of W can be found as follows:

select a random vector w with ||w||₂=1
Let w⁺ = mean(x*G’(w^Tx)) - mean(G’’(w^Tx))w
repeat until w^Tw is converges to 1

The second equation is confusing and it is worth the time to show how it actually works.

w is a mixing vector.
w^Tx is a potential independent signal found by mixing measured signals
G’(w^Tx) and G’’(w^Tx) simply transform each value of the mixed signal by G’ and G’’.
x*G’(w^Tx) takes the inner product of each signal in X and the signal G’(w^Tx). The result is a vector with a number of elements equal to the number of signals.
mean(x*G’(w^Tx)) the same vector x*G’(w^Tx), but each element is divided by the number of samples in each signal. This seems to be a strange definition of mean.
mean(G’’(w^Tx)) is the average value of all of the values of in the signal G’’(w^Tx) and is a scalar value.
mean(x*G’(w^Tx)) - mean(G’’(w^Tx))w => vector - scalar*vector and the units work out.

We can find multiple columns of W the same way, but we need to decorrelate the rows at each step to prevent multiple columns from converging to the same solution. This is similar to orthogonalization of eigenvectors. Gram-Schmidt can be used to orthogonalize each vector with the previously discovered vectors.

Alternatively, all of the vectors can be calculated at once. The "square root" method is used to decorrelate all of the vectors at once.

W = (WW^T)^-1/2W

This method requires an eigenvalue decomposition and is computationally expensive.

This can also be calculated iteratively:

W = W / sqrt(||WW^T||)
W = (3/2)*W - (1/2)*WW^TW
Repeat step 2 until convergence

The norm is step one can be the 1-norm or 2-norm, but not the Frobenius norm.

Up
Prev Next