Cannot read from float sparse array
See original GitHub issueTo reproduce:
-
Download nefazodone.raw.
-
Run script:
import sys
import tiledb
import numpy as np
ctx = tiledb.Ctx()
if '--init' in sys.argv:
tiledb.remove(ctx, 'spec')
dom = tiledb.Domain(ctx,
tiledb.Dim(ctx, name='scan', domain=(1, 506), tile=20, dtype=float),
tiledb.Dim(ctx, name='mz', domain=(0, 2000), tile=10, dtype=float))
schema = tiledb.ArraySchema(ctx, domain=dom, sparse=True, capacity=1024,
attrs=[tiledb.Attr(ctx, name='intensity', dtype=float, compressor=('lz4', 0))],
coords_compressor=('lz4', 0))
spec = tiledb.SparseArray.create('spec', schema)
with tiledb.SparseArray(ctx, 'spec', mode='w') as spec:
npoints = 27_464_448
scan_arr = np.zeros(npoints)
mz_arr = np.zeros(npoints)
intens_arr = np.zeros(npoints)
tiledb.stats_enable()
i = 0
scan = 0
for line in open('nefazodone.raw'):
line = line.strip()
if line.startswith('RetTime='):
scan += 1
elif line.startswith('Mz='):
mz, intens = line.split('=')[1].split(' ')
scan_arr[i] = scan
mz_arr[i] = float(mz)
intens_arr[i] = float(intens)
i += 1
spec[scan_arr, mz_arr] = {'intensity': intens_arr}
assert i == npoints
tiledb.stats_dump()
tiledb.stats_disable()
with tiledb.SparseArray(ctx, 'spec', mode='r') as spec:
print(spec.nonempty_domain())
print(spec.domain)
tiledb.stats_enable()
data = spec[9.5:10.5, :]
tiledb.stats_dump()
tiledb.stats_disable()
print(data['intensity'])
tiledb.stats_enable()
data = spec[468.5:469.5, :]
tiledb.stats_dump()
tiledb.stats_disable()
print(data['intensity'])
> python3 test.py --init
It outputs
===================================== TileDB Statistics Report =======================================
Individual function statistics:
Function name # calls Total time (ns)
----------------------------------------------------------------------------------------------------
compressor_blosc_compress, 0, 0
compressor_blosc_decompress, 0, 0
compressor_bzip_compress, 0, 0
compressor_bzip_decompress, 0, 0
compressor_dd_compress, 0, 0
compressor_dd_decompress, 0, 0
compressor_gzip_compress, 40, 64836000
compressor_gzip_decompress, 0, 0
compressor_lz4_compress, 80463, 914778000
compressor_lz4_decompress, 0, 0
compressor_rle_compress, 0, 0
compressor_rle_decompress, 0, 0
compressor_zstd_compress, 0, 0
compressor_zstd_decompress, 0, 0
cache_lru_evict, 0, 0
cache_lru_insert, 0, 0
cache_lru_read, 0, 0
cache_lru_read_partial, 0, 0
reader_compute_cell_ranges, 0, 0
reader_compute_dense_cell_ranges, 0, 0
reader_compute_dense_overlapping_tiles_and_cell_ranges, 0, 0
reader_compute_overlapping_coords, 0, 0
reader_compute_overlapping_tiles, 0, 0
reader_compute_tile_coordinates, 0, 0
reader_copy_fixed_cells, 0, 0
reader_copy_var_cells, 0, 0
reader_dedup_coords, 0, 0
reader_dense_read, 0, 0
reader_fill_coords, 0, 0
reader_init_tile_fragment_dense_cell_range_iters, 0, 0
reader_next_subarray_partition, 0, 0
reader_read, 0, 0
reader_read_all_tiles, 0, 0
reader_sort_coords, 0, 0
reader_sparse_read, 0, 0
writer_check_coord_dups, 1, 169219000
writer_check_coord_dups_global, 0, 0
writer_compute_coord_dups, 0, 0
writer_compute_coord_dups_global, 0, 0
writer_compute_coords_metadata, 1, 175605000
writer_compute_write_cell_ranges, 0, 0
writer_create_fragment, 1, 569000
writer_global_write, 0, 0
writer_init_global_write_state, 0, 0
writer_init_tile_dense_cell_range_iters, 0, 0
writer_ordered_write, 0, 0
writer_prepare_full_tiles_fixed, 0, 0
writer_prepare_full_tiles_var, 0, 0
writer_prepare_tiles_fixed, 2, 1878252000
writer_prepare_tiles_ordered, 0, 0
writer_prepare_tiles_var, 0, 0
writer_sort_coords, 1, 278904000
writer_unordered_write, 1, 13153812000
writer_write, 1, 13153813000
writer_write_tiles, 2, 21547843000
sm_array_close, 0, 0
sm_array_open, 0, 0
sm_read_from_cache, 0, 0
sm_write_to_cache, 0, 0
sm_query_submit, 1, 13153859000
tileio_read, 0, 0
tileio_write, 53642, 20869841000
tileio_compress_tile, 53643, 1561913000
tileio_compress_one_tile, 80464, 1223685000
tileio_decompress_tile, 0, 0
tileio_decompress_one_tile, 0, 0
vfs_abs_path, 4, 33000
vfs_close_file, 3, 629135000
vfs_constructor, 0, 0
vfs_create_bucket, 0, 0
vfs_create_dir, 1, 331000
vfs_create_file, 0, 0
vfs_destructor, 0, 0
vfs_empty_bucket, 0, 0
vfs_file_size, 0, 0
vfs_filelock_lock, 0, 0
vfs_filelock_unlock, 0, 0
vfs_init, 0, 0
vfs_is_bucket, 0, 0
vfs_is_dir, 2, 231000
vfs_is_empty_bucket, 0, 0
vfs_is_file, 0, 0
vfs_ls, 0, 0
vfs_move_file, 0, 0
vfs_move_dir, 0, 0
vfs_open_file, 0, 0
vfs_read, 0, 0
vfs_remove_bucket, 0, 0
vfs_remove_file, 0, 0
vfs_remove_dir, 0, 0
vfs_supports_fs, 0, 0
vfs_sync, 0, 0
vfs_write, 53644, 19233315000
vfs_s3_fill_file_buffer, 0, 0
vfs_s3_write_multipart, 0, 0
Individual counter statistics:
Counter name Value
------------------------------------------------------------------------------
cache_lru_inserts, 0
cache_lru_read_hits, 0
cache_lru_read_misses, 0
reader_num_attr_tiles_touched, 0
reader_num_fixed_cell_bytes_copied, 0
reader_num_fixed_cell_bytes_read, 0
reader_num_var_cell_bytes_copied, 0
reader_num_var_cell_bytes_read, 0
writer_num_attr_tiles_written, 53642
sm_contexts_created, 0
sm_query_submit_layout_col_major, 0
sm_query_submit_layout_row_major, 0
sm_query_submit_layout_global_order, 0
sm_query_submit_layout_unordered, 1
sm_query_submit_read, 0
sm_query_submit_write, 1
tileio_read_cache_hits, 0
tileio_read_num_bytes_read, 0
tileio_read_num_resulting_bytes, 0
tileio_write_num_bytes_written, 235219267
tileio_write_num_input_bytes, 661721730
vfs_read_total_bytes, 0
vfs_write_total_bytes, 235219267
vfs_read_num_parallelized, 0
vfs_posix_write_num_parallelized, 0
vfs_win32_write_num_parallelized, 0
vfs_s3_num_parts_written, 0
vfs_s3_write_num_parallelized, 0
Summary:
--------
Hardware concurrency: 8
Reads:
Read query submits: 0
Tile cache hit ratio: 0 / 0
Fixed-length tile data copy-to-read ratio: 0 / 0 bytes
Var-length tile data copy-to-read ratio: 0 / 0 bytes
Total tile data copy-to-read ratio: 0 / 0 bytes
Read compression ratio: 0 / 0 bytes
Writes:
Write query submits: 1
Tiles written: 53642
Write compression ratio: 661721730 / 235219267 bytes (2.8x)
And then hangs for long, followed by segfault.
Directory “spec” is attached.
- If remove
capacity=1024
argument, the first querydata = spec[9.5:10.5, :]
succeeds but the seconddata = spec[468.5:469.5, :]
still fails.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Python SparseArray Dtype to Float - pandas - Stack Overflow
If you don't need sparsity anymore, use SparseArray.values.to_dense() to convert the series into a dense numpy array.
Read more >SparseArray - Android Developers
SparseArray is intended to be more memory-efficient than a HashMap , because it avoids auto-boxing keys and its data structure doesn't rely on...
Read more >Sparse matrices (scipy.sparse) — SciPy v1.9.3 Manual
This package is switching to an array interface, compatible with NumPy arrays, from the older matrix interface. We recommend that you use the...
Read more >Sparse data structures — pandas 1.5.2 documentation
Use DataFrame.sparse.from_spmatrix() to create a DataFrame with sparse values from a sparse matrix. New in version 0.25.
Read more >[SciPy-User] Reading / writing sparse matrices - Google Groups
How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type?...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @aparamon, we’ve just merged a set of changes to fix building TileDB-Py against the current TileDB dev branch. It looks like the test files you sent are no longer available, so I couldn’t test your example myself. Please let us know if you have a chance to test again.
I tested with a modified version of the original code, generating random data, and cannot reproduce a segfault with TileDB 1.6.0 / TileDB-Py 0.4.3.
Closing for now, but will certainly re-open if needed. Thanks for the issue report.
code:
from: https://github.com/TileDB-Inc/TileDB-Py/issues/81
#%% import sys, os import tiledb import numpy as np
ctx = tiledb.Ctx()
#if ‘–init’ in sys.argv: if True: if os.path.isdir(‘spec’): tiledb.remove(‘spec’)
#%% with tiledb.SparseArray(‘spec’, mode=‘r’) as spec: print(spec.nonempty_domain()) print(spec.domain) tiledb.stats_enable() data = spec[9.5:10.5, :] tiledb.stats_dump() tiledb.stats_disable() print(data[‘intensity’])
#%%