question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot read from float sparse array

See original GitHub issue

To reproduce:

  1. Download nefazodone.raw.

  2. Run script:

import sys
import tiledb
import numpy as np

ctx = tiledb.Ctx()

if '--init' in sys.argv:
    tiledb.remove(ctx, 'spec')

    dom = tiledb.Domain(ctx,
                        tiledb.Dim(ctx, name='scan', domain=(1, 506), tile=20, dtype=float),
                        tiledb.Dim(ctx, name='mz', domain=(0, 2000), tile=10, dtype=float))
    schema = tiledb.ArraySchema(ctx, domain=dom, sparse=True, capacity=1024,
                                attrs=[tiledb.Attr(ctx, name='intensity', dtype=float, compressor=('lz4', 0))],
                                coords_compressor=('lz4', 0))
    spec = tiledb.SparseArray.create('spec', schema)

    with tiledb.SparseArray(ctx, 'spec', mode='w') as spec:
        npoints = 27_464_448
        scan_arr = np.zeros(npoints)
        mz_arr = np.zeros(npoints)
        intens_arr = np.zeros(npoints)

        tiledb.stats_enable()

        i = 0
        scan = 0
        for line in open('nefazodone.raw'):
            line = line.strip()
            if line.startswith('RetTime='):
                scan += 1
            elif line.startswith('Mz='):
                mz, intens = line.split('=')[1].split(' ')
                scan_arr[i] = scan
                mz_arr[i] = float(mz)
                intens_arr[i] = float(intens)
                i += 1

        spec[scan_arr, mz_arr] = {'intensity': intens_arr}
        assert i == npoints
        tiledb.stats_dump()
        tiledb.stats_disable()

with tiledb.SparseArray(ctx, 'spec', mode='r') as spec:
    print(spec.nonempty_domain())
    print(spec.domain)
    tiledb.stats_enable()
    data = spec[9.5:10.5, :]
    tiledb.stats_dump()
    tiledb.stats_disable()
    print(data['intensity'])

    tiledb.stats_enable()
    data = spec[468.5:469.5, :]
    tiledb.stats_dump()
    tiledb.stats_disable()
    print(data['intensity'])
> python3 test.py --init

It outputs

===================================== TileDB Statistics Report =======================================

Individual function statistics:
  Function name                                                          # calls       Total time (ns)
  ----------------------------------------------------------------------------------------------------
  compressor_blosc_compress,                                                   0,                   0
  compressor_blosc_decompress,                                                 0,                   0
  compressor_bzip_compress,                                                    0,                   0
  compressor_bzip_decompress,                                                  0,                   0
  compressor_dd_compress,                                                      0,                   0
  compressor_dd_decompress,                                                    0,                   0
  compressor_gzip_compress,                                                   40,            64836000
  compressor_gzip_decompress,                                                  0,                   0
  compressor_lz4_compress,                                                 80463,           914778000
  compressor_lz4_decompress,                                                   0,                   0
  compressor_rle_compress,                                                     0,                   0
  compressor_rle_decompress,                                                   0,                   0
  compressor_zstd_compress,                                                    0,                   0
  compressor_zstd_decompress,                                                  0,                   0
  cache_lru_evict,                                                             0,                   0
  cache_lru_insert,                                                            0,                   0
  cache_lru_read,                                                              0,                   0
  cache_lru_read_partial,                                                      0,                   0
  reader_compute_cell_ranges,                                                  0,                   0
  reader_compute_dense_cell_ranges,                                            0,                   0
  reader_compute_dense_overlapping_tiles_and_cell_ranges,                      0,                   0
  reader_compute_overlapping_coords,                                           0,                   0
  reader_compute_overlapping_tiles,                                            0,                   0
  reader_compute_tile_coordinates,                                             0,                   0
  reader_copy_fixed_cells,                                                     0,                   0
  reader_copy_var_cells,                                                       0,                   0
  reader_dedup_coords,                                                         0,                   0
  reader_dense_read,                                                           0,                   0
  reader_fill_coords,                                                          0,                   0
  reader_init_tile_fragment_dense_cell_range_iters,                            0,                   0
  reader_next_subarray_partition,                                              0,                   0
  reader_read,                                                                 0,                   0
  reader_read_all_tiles,                                                       0,                   0
  reader_sort_coords,                                                          0,                   0
  reader_sparse_read,                                                          0,                   0
  writer_check_coord_dups,                                                     1,           169219000
  writer_check_coord_dups_global,                                              0,                   0
  writer_compute_coord_dups,                                                   0,                   0
  writer_compute_coord_dups_global,                                            0,                   0
  writer_compute_coords_metadata,                                              1,           175605000
  writer_compute_write_cell_ranges,                                            0,                   0
  writer_create_fragment,                                                      1,              569000
  writer_global_write,                                                         0,                   0
  writer_init_global_write_state,                                              0,                   0
  writer_init_tile_dense_cell_range_iters,                                     0,                   0
  writer_ordered_write,                                                        0,                   0
  writer_prepare_full_tiles_fixed,                                             0,                   0
  writer_prepare_full_tiles_var,                                               0,                   0
  writer_prepare_tiles_fixed,                                                  2,          1878252000
  writer_prepare_tiles_ordered,                                                0,                   0
  writer_prepare_tiles_var,                                                    0,                   0
  writer_sort_coords,                                                          1,           278904000
  writer_unordered_write,                                                      1,         13153812000
  writer_write,                                                                1,         13153813000
  writer_write_tiles,                                                          2,         21547843000
  sm_array_close,                                                              0,                   0
  sm_array_open,                                                               0,                   0
  sm_read_from_cache,                                                          0,                   0
  sm_write_to_cache,                                                           0,                   0
  sm_query_submit,                                                             1,         13153859000
  tileio_read,                                                                 0,                   0
  tileio_write,                                                            53642,         20869841000
  tileio_compress_tile,                                                    53643,          1561913000
  tileio_compress_one_tile,                                                80464,          1223685000
  tileio_decompress_tile,                                                      0,                   0
  tileio_decompress_one_tile,                                                  0,                   0
  vfs_abs_path,                                                                4,               33000
  vfs_close_file,                                                              3,           629135000
  vfs_constructor,                                                             0,                   0
  vfs_create_bucket,                                                           0,                   0
  vfs_create_dir,                                                              1,              331000
  vfs_create_file,                                                             0,                   0
  vfs_destructor,                                                              0,                   0
  vfs_empty_bucket,                                                            0,                   0
  vfs_file_size,                                                               0,                   0
  vfs_filelock_lock,                                                           0,                   0
  vfs_filelock_unlock,                                                         0,                   0
  vfs_init,                                                                    0,                   0
  vfs_is_bucket,                                                               0,                   0
  vfs_is_dir,                                                                  2,              231000
  vfs_is_empty_bucket,                                                         0,                   0
  vfs_is_file,                                                                 0,                   0
  vfs_ls,                                                                      0,                   0
  vfs_move_file,                                                               0,                   0
  vfs_move_dir,                                                                0,                   0
  vfs_open_file,                                                               0,                   0
  vfs_read,                                                                    0,                   0
  vfs_remove_bucket,                                                           0,                   0
  vfs_remove_file,                                                             0,                   0
  vfs_remove_dir,                                                              0,                   0
  vfs_supports_fs,                                                             0,                   0
  vfs_sync,                                                                    0,                   0
  vfs_write,                                                               53644,         19233315000
  vfs_s3_fill_file_buffer,                                                     0,                   0
  vfs_s3_write_multipart,                                                      0,                   0

Individual counter statistics:
  Counter name                                                             Value
  ------------------------------------------------------------------------------
  cache_lru_inserts,                                                           0
  cache_lru_read_hits,                                                         0
  cache_lru_read_misses,                                                       0
  reader_num_attr_tiles_touched,                                               0
  reader_num_fixed_cell_bytes_copied,                                          0
  reader_num_fixed_cell_bytes_read,                                            0
  reader_num_var_cell_bytes_copied,                                            0
  reader_num_var_cell_bytes_read,                                              0
  writer_num_attr_tiles_written,                                           53642
  sm_contexts_created,                                                         0
  sm_query_submit_layout_col_major,                                            0
  sm_query_submit_layout_row_major,                                            0
  sm_query_submit_layout_global_order,                                         0
  sm_query_submit_layout_unordered,                                            1
  sm_query_submit_read,                                                        0
  sm_query_submit_write,                                                       1
  tileio_read_cache_hits,                                                      0
  tileio_read_num_bytes_read,                                                  0
  tileio_read_num_resulting_bytes,                                             0
  tileio_write_num_bytes_written,                                      235219267
  tileio_write_num_input_bytes,                                        661721730
  vfs_read_total_bytes,                                                        0
  vfs_write_total_bytes,                                               235219267
  vfs_read_num_parallelized,                                                   0
  vfs_posix_write_num_parallelized,                                            0
  vfs_win32_write_num_parallelized,                                            0
  vfs_s3_num_parts_written,                                                    0
  vfs_s3_write_num_parallelized,                                               0

Summary:
--------
Hardware concurrency: 8
Reads:
  Read query submits: 0
  Tile cache hit ratio: 0 / 0 
  Fixed-length tile data copy-to-read ratio: 0 / 0 bytes
  Var-length tile data copy-to-read ratio: 0 / 0 bytes
  Total tile data copy-to-read ratio: 0 / 0 bytes
  Read compression ratio: 0 / 0 bytes
Writes:
  Write query submits: 1
  Tiles written: 53642
  Write compression ratio: 661721730 / 235219267 bytes (2.8x)

And then hangs for long, followed by segfault.

Directory “spec” is attached.

  1. If remove capacity=1024 argument, the first query data = spec[9.5:10.5, :] succeeds but the second data = spec[468.5:469.5, :] still fails.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ihnortoncommented, Feb 22, 2019

Hi @aparamon, we’ve just merged a set of changes to fix building TileDB-Py against the current TileDB dev branch. It looks like the test files you sent are no longer available, so I couldn’t test your example myself. Please let us know if you have a chance to test again.

0reactions
ihnortoncommented, Jul 31, 2019

I tested with a modified version of the original code, generating random data, and cannot reproduce a segfault with TileDB 1.6.0 / TileDB-Py 0.4.3.

Closing for now, but will certainly re-open if needed. Thanks for the issue report.

code:

```

from: https://github.com/TileDB-Inc/TileDB-Py/issues/81

#%% import sys, os import tiledb import numpy as np

ctx = tiledb.Ctx()

#if ‘–init’ in sys.argv: if True: if os.path.isdir(‘spec’): tiledb.remove(‘spec’)

filters = tiledb.FilterList([tiledb.LZ4Filter()])
dom = tiledb.Domain(tiledb.Dim(name='scan', domain=(1, 506), tile=20, dtype=float),
                    tiledb.Dim(name='mz', domain=(0, 2000), tile=10, dtype=float))
schema = tiledb.ArraySchema(domain=dom, sparse=True, capacity=1024,
                            attrs=[tiledb.Attr(name='intensity', dtype=float, filters=filters)])
spec = tiledb.SparseArray.create('spec', schema)

with tiledb.SparseArray('spec', mode='w') as spec:
    #npoints = 27_464_448
    #scan_arr = np.zeros(npoints)
    #mz_arr = np.zeros(npoints)
    #intens_arr = np.zeros(npoints)

    tiledb.stats_enable()

    n_scans = 1_000_000
    scan_arr = np.repeat(np.arange(1,507), 1000)
    mz_arr = np.tile(np.arange(0,1000,step=1), 506)
    intens_arr = np.random.rand(1000*506)

    #i = 0
    #scan = 0
    #for line in open('nefazodone.raw'):
    #    line = line.strip()
    #    if line.startswith('RetTime='):
    #        scan += 1
    #    elif line.startswith('Mz='):
    #        mz, intens = line.split('=')[1].split(' ')
    #        scan_arr[i] = scan
    #        mz_arr[i] = float(mz)
    #        intens_arr[i] = float(intens)
    #        i += 1

    spec[scan_arr, mz_arr] = {'intensity': intens_arr}
    #assert i == npoints
    tiledb.stats_dump()
    tiledb.stats_disable()

#%% with tiledb.SparseArray(‘spec’, mode=‘r’) as spec: print(spec.nonempty_domain()) print(spec.domain) tiledb.stats_enable() data = spec[9.5:10.5, :] tiledb.stats_dump() tiledb.stats_disable() print(data[‘intensity’])

tiledb.stats_enable()
data = spec[468.5:469.5, :]
tiledb.stats_dump()
tiledb.stats_disable()
print(data['intensity'])

#%%


</details>
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python SparseArray Dtype to Float - pandas - Stack Overflow
If you don't need sparsity anymore, use SparseArray.values.to_dense() to convert the series into a dense numpy array.
Read more >
SparseArray - Android Developers
SparseArray is intended to be more memory-efficient than a HashMap , because it avoids auto-boxing keys and its data structure doesn't rely on...
Read more >
Sparse matrices (scipy.sparse) — SciPy v1.9.3 Manual
This package is switching to an array interface, compatible with NumPy arrays, from the older matrix interface. We recommend that you use the...
Read more >
Sparse data structures — pandas 1.5.2 documentation
Use DataFrame.sparse.from_spmatrix() to create a DataFrame with sparse values from a sparse matrix. New in version 0.25.
Read more >
[SciPy-User] Reading / writing sparse matrices - Google Groups
How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type?...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found