In pca.py, you are using numpy linalg. I tend to frown on that, because my experience is that quite often numpy linalg is compiled using the included mini-blas, which is not terribly good (for instance, at neurospin, it is the case, but I've seen it elsewhere). As we have a dependency on scipy, if you use scipy.linalg, you know that you are relying on good quality linear algebra libraries.
Also, I see that you use np.std. I remember moving away from it, because it had very bad performance on big arrays:
{{{
In [6]: t = np.random.random((50000, 800))
In [7]: %timeit np.sqrt((np.square(t).sum(axis=0))/t.shape[0])
1 loops, best of 3: 1.19 s per loop
In [8]: %timeit np.sqrt((np.square(t).sum(axis=1))/t.shape[0])
1 loops, best of 3: 587 ms per loop
In [9]: %timeit t.std(axis=1)
1 loops, best of 3: 1.19 s per loop
In [10]: %timeit t.std(axis=0)
1 loops, best of 3: 2.33 s per loop
}}}
A few remarks:
In pca.py, you are using numpy linalg. I tend to frown on that, because my experience is that quite often numpy linalg is compiled using the included mini-blas, which is not terribly good (for instance, at neurospin, it is the case, but I've seen it elsewhere). As we have a dependency on scipy, if you use scipy.linalg, you know that you are relying on good quality linear algebra libraries.
Also, I see that you use np.std. I remember moving away from it, because it had very bad performance on big arrays: random( (50000, 800))
{{{
In [6]: t = np.random.
In [7]: %timeit np.sqrt( (np.square( t).sum( axis=0) )/t.shape[ 0])
1 loops, best of 3: 1.19 s per loop
In [8]: %timeit np.sqrt( (np.square( t).sum( axis=1) )/t.shape[ 0])
1 loops, best of 3: 587 ms per loop
In [9]: %timeit t.std(axis=1)
1 loops, best of 3: 1.19 s per loop
In [10]: %timeit t.std(axis=0)
1 loops, best of 3: 2.33 s per loop
}}}