question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalizing/Scaling for PCA initialization with mexican hat function

See original GitHub issue

Hi, I’m trying to use minisom to obtain sea-level variability clusters. The use of SOM in oceanographic studies had been discussed in some articles, where they describe the best parameters, but most of them make use of the MatLab SOM toolbox. All papers suggest a linear initialisation (to which the PCA initialization of minisom would be equivalent), and the ‘ep’ neighbourhood function (which I believe is equivalent, or at least most similar to, to the mexican_hat function).

Before performing SOM, the data requires some treatment:

  1. It must be concatenated from 3D (time,lat,lon) to 2D (time, latxlon)
  2. All NaNs must be removes
  3. Data must be normalized.

When I normalize the dataset with the sklearn.preprocessing.MinMaxScaler function, image

and try the PCA initialization with the gaussian function, it seems to work: image

But if I try with the mexican_hat function, it doesn’t work: image

if I change to random initialization, and keep the mexican hat, then it works again: image

If instead of normalizing the dataset (so that it ranges from 0 to 1), I scale it… image

and then try the PCA initialization, wether with the gaussian or mexican hat, it works, but it gives very large QE. image

Can you explain to me why this is happening and what is the correct way of dealing with this? I apologize if this is something trivial, but this is my first contact with machine learning/neural network methods.

Thanks SOm-testing.pdf

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
JustGlowingcommented, Sep 29, 2021

you can use the following index sequential_index = i % numeber_of_samples-1

1reaction
JustGlowingcommented, Sep 28, 2021

Regarding the quantization error (QE), read again my message above. It really depends on the data that you have and how you normalize it.

The topic of picking the best parameters has been discussed in previous issues. Please have a look.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Normalizing vs Scaling before PCA - Cross Validated
Scaling (what I would call centering and scaling) is very important for PCA because of the way that the principal components are calculated....
Read more >
Continuous and discrete Mexican hat wavelet transforms on ...
This paper systematically studies the well-known Mexican hat wavelet (MHW) on manifold geometry, including its derivation, properties, ...
Read more >
initSOM: Initialize parameters for the SOM algorithm in tuxette ...
The initSOM function returns a paramSOM class object that contains the parameters needed to run the SOM algorithm.
Read more >
Jacobian Determinant of Normalizing Flows - arXiv
Normalizing flows learn a diffeomorphic mapping between the target and base distribution, while the. Jacobian determinant of that mapping ...
Read more >
Coupled Subspace Analysis and PCA Variants - DigiNole
2 Principal Components Analysis (PCA), PCA Variants, and Coupled Subspace ... performance utilizing Haar, Mexican Hat, Shannon, and Gabor wavelets to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found