Kmeans processing of spectral.io.bipfile.BipFile versus numpy array
See original GitHub issueHello, I have a msam result (7 bands, default float64 array) that I want to run kmeans classification on. I need to run np.nan_to_num() on the file before applying kmeans. It works fine and terminal/notebook output looks like this:
spectral:INFO: k-means iteration 1 - 999998 pixels reassigned.
INFO:spectral:k-means iteration 1 - 999998 pixels reassigned.
spectral:INFO: k-means iteration 2 - 128415 pixels reassigned.
INFO:spectral:k-means iteration 2 - 128415 pixels reassigned.
spectral:INFO: k-means iteration 3 - 127319 pixels reassigned.
INFO:spectral:k-means iteration 3 - 127319 pixels reassigned.
...
I want to save the msam result to disk before the kmeans processing, to save time. To save, I used:
outfilename = 'msam_result.hdr'
envi.save_image(outfilename, cube_msam, dtype=np.float64, force=True)
When I read that file back in to run kmeans on it, I use this code:
msam_result_envi = envi.open('msam_result.hdr', image='msam_result.img')
Result is a BipFile. Here are the metadata:
Data Source: '.\msam_result.img'
# Rows: 1000
# Samples: 1000
# Bands: 7
Interleave: BIP
Quantization: 64 bits
Data format: float64
When I try to run Kmeans on it, I get this terminal output:
Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 0.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 1.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 2.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 3.Iteration 1... 4.Iteration 1... 4.Iteration 1... 4.Iteration 1... 4.Iteration 1
The processing takes too long, and I kill it each time, but it seems to be working. What am I doing wrong here? Do I need to alter the data type or band interleaving of the BipFile, or convert it to an array, before running Kmeans on it? Sorry to trouble you, and thank you in advance…
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
By default, images are loaded as 32-bit float arrays. You can pass an optional “dtype” keyword to the load method.
BipFile
(a subclass ofSpyFile
) is just an interface to the image data file and accessing the data that way repetitively is very slow because it will keep reading data from disk every time the same data are needed. It will be much faster if you read that data to memory instead: