Normalizing/Scaling for PCA initialization with mexican hat function
See original GitHub issueHi, I’m trying to use minisom to obtain sea-level variability clusters. The use of SOM in oceanographic studies had been discussed in some articles, where they describe the best parameters, but most of them make use of the MatLab SOM toolbox. All papers suggest a linear initialisation (to which the PCA initialization of minisom would be equivalent), and the ‘ep’ neighbourhood function (which I believe is equivalent, or at least most similar to, to the mexican_hat function).
Before performing SOM, the data requires some treatment:
- It must be concatenated from 3D (time,lat,lon) to 2D (time, latxlon)
- All NaNs must be removes
- Data must be normalized.
When I normalize the dataset with the sklearn.preprocessing.MinMaxScaler
function,
and try the PCA initialization with the gaussian function, it seems to work:
But if I try with the mexican_hat function, it doesn’t work:
if I change to random initialization, and keep the mexican hat, then it works again:
If instead of normalizing the dataset (so that it ranges from 0 to 1), I scale it…
and then try the PCA initialization, wether with the gaussian or mexican hat, it works, but it gives very large QE.
Can you explain to me why this is happening and what is the correct way of dealing with this? I apologize if this is something trivial, but this is my first contact with machine learning/neural network methods.
Thanks SOm-testing.pdf
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
you can use the following index
sequential_index = i % numeber_of_samples-1
Regarding the quantization error (QE), read again my message above. It really depends on the data that you have and how you normalize it.
The topic of picking the best parameters has been discussed in previous issues. Please have a look.